Re: [Python-Dev] nice()

2006-02-13 Thread Smith
| From: Josiah Carlson <[EMAIL PROTECTED]>
| "Alan Gauld" <[EMAIL PROTECTED]> wrote:
|| However I do dislike the name nice() - there is already a nice() in
|| the 
|| os module with a fairly well understood function. 

perhaps trim(), nearly(), about(), defer_the_pain_of() :-) I've waited to think 
of names until after writing this. The reason for the last name option may 
become apparent after reading the rest of this post.

|| But I'm sure some
|| time with a thesaurus can overcome that single mild objection. :-)
| 
| Presumably it would be located somewhere like the math module.

I would like to see it as accessible as round, int, float, and repr. I really 
think a round-from-the-left is a nice tool to have. It's obviously very easy to 
build your own if you know what tools to use. Not everyone is going to be 
reading the python-dev or similar lists, however, and so having it handy would 
be nice.

| From: Greg Ewing <[EMAIL PROTECTED]>
| Smith wrote:
| 
|| When teaching some programming to total newbies, a common
|| frustration is how to explain why a==b is False when a and b are
|| floats computed by different routes which ``should'' give the
|| same results (if arithmetic had infinite precision).
| 
| This is just a special case of the problems inherent
| in the use of floating point. As with all of these,
| papering over this particular one isn't going to help
| in the long run -- another one will pop up in due
| course.
| 
| Seems to me it's better to educate said newbies not
| to use algorithms that require comparing floats for
| equality at all. 

I think that having a helper function like nice() is a middle ground solution 
to the problem, falling short of using only decimal or rational values for 
numbers and doing better than requiring a test of error between floating values 
that should be equal but aren't because of alternate methods of computation. 
Just like the argument for having true division being the default behavior for 
the computational environment, it seems a little unfriendly to expect the more 
casual user to have to worry that 3*0.1 is not the same as 3/10.0. I know--they 
really are different, and one should (eventually) understand why, but does 
anyone really want the warts of floating point representation to be popping up 
in their work if they could be avoided, or at least easily circumvented?

I know you know why the following numbers show up as not equal, but this would 
be an example of the pain in working with a reasonably simple exercise of, say, 
computing the bin boundaries for a histogram where bins are a width of 0.1: 

###
>>> for i in range(20):
...  if (i*.1==i/10.)<>(nice(i*.1)==nice(i/10.)):
...   print i,repr(i*.1),repr(i/10.),i*.1,i/10.
... 
3 0.30004 0.2 0.3 0.3
6 0.60009 0.59998 0.6 0.6
7 0.70007 0.69996 0.7 0.7
12 1.2002 1.2 1.2 1.2
14 1.4001 1.3999 1.4 1.4
17 1.7002 1.7 1.7 1.7
19 1.9001 1.8999 1.9 1.9
###

For, say, garden variety numbers that aren't full of garbage digits resulting 
from fp computation, the boundaries computed as 0.1*i are not going to agree 
with such simple numbers as 1.4 and 0.7.

Would anyone (and I truly don't know the answer) really mind if all floating 
point values were filtered through whatever lies behind the str() manipulation 
of floats before the computation was made? I'm not saying that strings would be 
compared, but that float(str(x)) would be compared to float(str(y)) if x were 
being compared to y as in x<=y. If this could be done, wouldn't a lot of grief 
just go away and not require the use of decimal or rational types for many 
users? 

I understand that the above really is just a patch over the problem, but I'm 
wondering if it moves the problem far enough away that most users wouldn't have 
to worry about it. Here, for example, are the first values where the running 
sum doesn't equal the straight multiple of some step size:

###
>>> def go(x,n=1000):
...  s=0;i=0
...  while snice(i*x):
...return i,s,i*x,`s`,`i*x`
... 
>>> for i in range(1,100):
...  print i, go(i/1000.)
...  print
...  
1 (60372 60.371999 60.372 60.3719994 60.372)

2 (49645 99.28 99.29 99.2849998 99.296)
###

The soonest the breakdown occurs is at the 22496th multiple of 0.041 for the 
range given above. By the time someone starts getting into needs of iterating 
so many times, they will be ready to use the more sophisticated option of 
nice()--the one which makes it more versatile and less of a patch--the option 
to round the answers to a given number of leading digits rather than a given 
decimal precision like round. nice() gives a simple way to think about making a 
comparison of floats. You just ha

Re: [Python-Dev] nice()

2006-02-15 Thread Smith
I am reluctantly posting here since this is of less intense interest than other 
things being discussed right now, but this is related to the areclose proposal 
that was discussed here recently. 

The following discussion ends with things that python-dev might want to 
consider in terms of adding a function that allows something other than the 
default 12- and 17-digit precision representations of numbers that str() and 
repr() give. Such a function (like nice(), perhaps named trim()?) would provide 
a way to convert fp numbers that are being used in comparisons into a precision 
that reflects the user's preference.  

Everyone knows that fp numbers must be compared with caution, but there is a 
void in the relative-error department for exercising such caution, thus the 
proposal for something like 'areclose'. The problem with areclose(), however, 
is that it only solves one part of the problem that needs to be solved if two 
fp's *are* going to be compared: if you are going to check if a < b you would 
need to do something like 

not areclose(a,b) and a < b

With something like trim() (a.k.a nice()) you could do

trim(a) < trim(b)

to get the comparison to 12-digit default precision or arbitrary precision with 
optional arguments, e.g. to 3 digits of precision:

trim(a,3) < trim(b,3)

>From a search on the documentation, I don't see that the name trim() is taken 
>yet.

OK, comments responding to Greg follow.


| From: Greg Ewing [EMAIL PROTECTED]
| Smith wrote:
| 
|| computing the bin boundaries for a histogram
|| where bins are a width of 0.1:
|| 
| for i in range(20):
|| ...  if (i*.1==i/10.)<>(nice(i*.1)==nice(i/10.)):
|| ...   print i,repr(i*.1),repr(i/10.),i*.1,i/10.
| 
| I don't see how that has any relevance to the way bin boundaries
| would be used in practice, which is to say something like
| 
|   i = int(value / 0.1)
|   bin[i] += 1 # modulo appropriate range checks

This is just masking the issue by converting numbers to integers. The fact 
remains that two mathematically equal numbers can have two different internal 
representations with one being slightly larger than the exact integer value and 
one smaller:

>>> a=(23*.1)*10;a
23.004
>>> b=2.3/.1;b
22.996
>>> int(a/.1),int(b/.1)
(230, 229)

Part of the answer in this context is to use round() rather than int so you are 
getting to the closest integer.

 
|| For, say, garden variety numbers that aren't full of garbage digits
|| resulting from fp computation, the boundaries computed as 0.1*i are\
|| not going to agree with such simple numbers as 1.4 and 0.7.
| 
| Because the arithmetic is binary rather than decimal. But even using
| decimal, you get the same sort of problems using a bin width of
| 1.0/3.0. The solution is to use an algorithm that isn't sensitive
| to those problems, then it doesn't matter what base your arithmetic
| is done in.

Agreed.

| 
|| I understand that the above really is just a patch over the problem,
|| but I'm wondering if it moves the problem far enough away that most
|| users wouldn't have to worry about it.
| 
| No, it doesn't. The problems are not conveniently grouped together
| in some place you can get away from; they're scattered all over the
| place where you can stumble upon one at any time.
| 

Yes, even a simple computation of the wrong type can lead to unexpected 
results. I agree.

|| So perhaps this brings us back to the original comment that "fp
|| issues are a learning opportunity." They are. The question I have is
|| "how 
|| soon  do they need to run into them?" Is decreasing the likelihood
|| that they will see the problem (but not eliminate it) a good thing
|| for the python community or not?
| 
| I don't think you're doing anyone any favours by trying to protect
| them from having to know about these things, because they *need* to
| know about them if they're not to write algorithms that seem to
| work fine on tests but mysteriously start producing garbage when
| run on real data, possibly without it even being obvious that it is
| garbage.

Mostly I agree, but if you go to the extreme then why don't we just drop 
floating point comparisons altogether and force the programmer to convert 
everything to integers and make their own bias evident (like converting to int 
rather than nearest int). Or we drop the fp comparison operators and introduce 
fp comparison functions that require the use of tolerance terms to again make 
the assumptions transparent: 

def lt(x, y, rel_err = 1e-5, abs_err = 1e-8):
return not areclose(x,y,rel_err,abs_err) and int(x-y)<=0
print lt(a,b,0,1e-10) --> False (they are equal to that tolerance)
print lt(a,b,0,1e-20) --> True (a is less than b at that tolerance)

The fact is, we make things easier and let the programmer shoot themselves in 
the foot if they want t

Re: [Python-Dev] math.areclose ...?

2006-02-15 Thread Smith
A problem that I pointed out with the proposed areclose() function is that it 
has within it a fp comparison. If such a function is to have greater utility, 
it should allow the user to specify how significant to consider the computed 
error. A natural extension of being able to tell if 2 fp numbers are close is 
to make a more general comparison. For that purpose, a proposed fpcmp function 
is appended. From that, fp boolean comparison operators (le, gt, ...) are 
easily constructed.

Python allows fp comparison. This is significantly of source of surprises and 
learning experiences. Are any of these proposals of interest for providing 
tools to more intelligently make the fp comparisons?

###
#new proposal for the areclose() function
def areclose(x,y,atol=1e-8,rtol=1e-5,prec=12):
"""Return False if the |x-y| is greater than atol or
 greater than the absolute value of the larger of x and y, 
 otherwise True. The comparison is made by computing a 
 difference that should be 0 if the two numbers satisfy 
 either condition; prec controls the precision of the
 value that is obtained, e.g. 8.3__e-17 is obtained 
 for (2.1-2)-.1. But rounding to the 12th digit (the default 
 precision) the value of 0.0 is returned indicating that for
that precision there is no (significant) error."""

diff = abs(x-y)
return round(diff-atol,prec)<=0 or \
   round(diff-rtol*max(abs(x),abs(y)),prec)<=0

#fp cmp
def fpcmp(x,y,atol=1e-8,rtol=1e-5,prec=12):
"""Return 0 if x and y are close in the absolute or 
relative sense. If not, then return -1 if x < y or +1 if x > y.
Note: prec controls how many digits of the error are retained
when checking for closeness."""

if areclose(x,y,atol,rtol,prec):
return 0
else:
return cmp(x,y)

# fp comparisons functions
def lt(x,y,atol=1e-8,rtol=1e-5,prec=12):
return fpcmp(x, y, atol, rtol, prec)==-1
def le(x,y,atol=1e-8,rtol=1e-5,prec=12):
return fpcmp(x, y, atol, rtol, prec) in (-1,0)
def eq(x,y,atol=1e-8,rtol=1e-5,prec=12):
return fpcmp(x, y, atol, rtol, prec)==0
def gt(x,y,atol=1e-8,rtol=1e-5,prec=12):
return fpcmp(x, y, atol, rtol, prec)==1
def ge(x,y,atol=1e-8,rtol=1e-5,prec=12):
return fpcmp(x, y, atol, rtol, prec) in (0,1)
def ne(x,y,atol=1e-8,rtol=1e-5,prec=12):
return fpcmp(x, y, atol, rtol, prec)<>0
###
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] math.areclose ...?

2006-02-07 Thread Smith
Raymond Hettinger wrote:
| [Chris Smith]
|| Does it help to spell it like this?
|| 
|| def areclose(x, y, relative_err = 1.e-5, absolute_err=1.e-8):
|| diff = abs(x - y)
|| ave = (abs(x) + abs(y))/2
|| return diff < absolute_err or diff/ave < relative_err
| 
| There is a certain beauty and clarity to this presentation; however,
| it is problematic numerically:
| 
| * the division by either absolute_err and relative_err can overflow or
| trigger a ZeroDivisionError

I'm not dividing by either of these values so that shouldn't be a problem. As 
long as absolute_err is not 0 then the first test would catch the possiblity 
that x==y==ave==0. (see below)

As for the overflow, does your version of python overflow? Mine (2.4) just 
returns 1.#INF which still computes as a number:

###
>>> 1.79769313486e+308+1.79769313486e+308
1.#INF
>>> inf=_
>>> inf>1
True
>>> inf<1
False
>>> 2./inf
0.0
>>> inf/inf
-1.#IND
###


There is a problem with dividing by 'ave' if the x and y are at the floating 
point limits, but the symmetric behaving form (presented by Scott Daniels) will 
have the same problem. The following format for close() has the same semantic 
meaning but avoids the overflow possibility and avoids extra work for the case 
when abs_tol=0 and x==y:

###
def close(x, y, abs_tol=1.e-8, rel_tol=1.e-5):
 '''Return True if |x-y| < abs_tol or |x-y|/ave(|x|,|y|) < rel_tol.
 The average is not computed directly so as to avoid overflow for
 numbers close to the floating point upper limit.'''
 if x==y: return True
 diff = abs(x - y)
 if diff < abs_tol: return True
 f = rel_tol/2.
 if diff < f*abs(x) + f*abs(y): return True
 return False
###

| 
| * the 'or' part of the expression can introduce an unnecessary
| discontinuity in the first derivative.
|
If a value other than boolean were being returned, I could see the desire for 
continuity in derivative. Since the original form presents a boolean result, 
however, I'm having a hard time thinking of how the continuity issue comes to 
play.
 
| The original Numeric definition is likely to be better for people who
| know what they're doing; however, I still question whether it is an
| appropriate remedy for the beginner issue
| of why 1.1 + 1.1 + 1.1 doesn't equal 3.3.
|

I'm in total agreement. Being able to see that math.areclose(1.1*3,3.3) is True 
but 1.1*3==3.3 is False is not going to make them feel much better. They are 
going to have to face the floating point issue. 

As for the experienced user, perhaps such a function would be helpful. Maybe it 
would be better to require that the tolerances be given rather than defaulting 
so as to make clear which test is being used if only one test was going to be 
used:

close(x,y,rel_tol=1e-5)
close(x,y,abs_tol=1e-8)

/c
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] small floating point number problem

2006-02-07 Thread Smith
I just ran into a curious behavior with small floating points, trying to find 
the limits of them on my machine (XP). Does anyone know why the '0.0' is 
showing up for one case below but not for the other? According to my tests, the 
smallest representable float on my machine is much smaller than 1e-308: it is

2.470328229206234e-325

but I can only create it as a product of two numbers, not directly. Here is an 
attempt to create the much larger 1e-308:

>>> a=1e-308
>>> a
0.0
>>> a==0
True<-- it really is 0; this is not a repr issue
>>> b=.1*1e-307
>>> b
9.9991e-309
>>> a==b
False<--they really are different
>>> 

Also, I see that there is some graininess in the numbers at the low end, but 
I'm guessing that there is some issue with floating points that I would need to 
read up on again. The above dilemma is a little more troublesome.

>>> m=2.470328229206234e-017
>>> s=1e-307
>>> m*s
4.9406564584124654e-324 #2x too large
>>> 2*m*s
4.9406564584124654e-324
>>> 3*m*s==4*m*s
True
>>> 

/c

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [BULK] Python-Dev Digest, Vol 31, Issue 37

2006-02-08 Thread Smith
| From: Michael Hudson <[EMAIL PROTECTED]>
| Guido van Rossum <[EMAIL PROTECTED]> writes:
| 
|| On 2/8/06, Patrick Collison <[EMAIL PROTECTED]> wrote:
||| And to think that people thought that keeping "lambda", but changing
||| the name, would avoid all the heated discussion... :-)
|| 
|| Note that I'm not participating in any attempts to "improve" lambda.
|| 
|| Just about the only improvement I'd like to see is to add parentheses
|| around the arguments, so you'd write lambda(x, y): x**y instead of
|| lambda x, y: x**y.
| 
| That would seem to be a bad idea, as it means something already:
| 
 f = lambda (x,y): x + y
 t = (1,2)
 f(t)
| 3
| 
| Cheers,
| mwh

Hey! I didn't know you could do that. I'm happy. My lambdas just grew 
parenthesis on the arguments:

>>> f=lambda(x):x+1
>>> f(2)
3
>>> def go(f,x):
...  print f(x)
...  
>>> go(lambda(x):x+1,1)
2
>>> go(lambda(x,y):x+y,(1,3))
4
>>> 

/c
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] py3k and not equal; re names

2006-02-09 Thread Smith
I'm wondering if it's just "foolish consistency" (to quote a PEP 8) that is 
calling for the dropping of <> in preference of only !=. I've used the former 
since the beginning in everything from basic, fortran, claris works, excel, 
gnumeric, and python. I tried to find a rationale for the dropping--perhaps 
there is some other object that will be represented (like an empty set). I'm 
sure there must be some reason, but just want to put a vote in for keeping this 
variety.

And another suggestion for py3k would be to increase the correspondence between 
string methods and re methods. e.g. since re.match and string.startswith are 
checking for the same thing, was there a reason to introduce the new names? The 
same question is asked for string.find and re.search.

Instead of having to learn another set of method names to use re, it would be 
nice to have the only change be the pattern used for the method.  Here is a 
side-by-side listing of methods in both modules that are candidates for 
consistency--hopefully not "foolish" ;-)

  stringre
  ----
  find  search  

startswithmatch
split split
replace   sub
NAsubn
NAfindall
NAfinditer

/c
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] nice()

2006-02-12 Thread Smith



I've been thinking 
about a function that was recently proposed at python-dev named 'areclose'. 
It is a function that is meant to tell whether two (or possible more) numbers 
are close to each other. It is a function similar to one that exists in 
Numeric. One such implementation is
 
def 
areclose(x,y,abs_tol=1e-8,rel_tol=1e-5):
    diff = 
abs(x-y)
    return diff 
<= ans_tol or diff <= rel_tol*max(abs(x),abs(y))
 
(This is the form given by 
Scott Daniels on python-dev.)
 
Anyway, one of 
the rationales for including such a function was: 
 

  When teaching some 
  programming to total newbies, a common frustrationis how to explain why 
  a==b is False when a and b are floats computedby different routes which 
  ``should'' give the same results (ifarithmetic had infinite precision). 
   Decimals can help, but anotherapproach I've found useful is embodied 
  in Numeric.allclose(a,b) --which returns True if all items of the arrays 
  are ``close'' (equal towithin certain absolute and relative 
  tolerances)
The problem with the above 
function, however, is that it *itself* has a comparison between floats and 
it will give undesired result for something like the following 
test:
 
###
>>> print areclose(2, 
2.1, .1, 0) #see if 2 and 2.1 are within 0.1 of each other
False
>>>
###
 
Here is an alternative that 
might be a nice companion to the repr() and round() functions: nice(). It is a 
combination of Tim Peter's delightful 'case closed' presentation in the thread, 
"Rounding to n significant digits?" [1] and the hidden magic of "prints" 
simplification of floating point numbers when being asked to show them. 

 
It's default behavior is to 
return a number in the form that the number would have when being printed. An 
optional argument, however, allows the user to specify the number of digits to 
round the number to as counted from the most significant digit. (An alternative 
name, then, could be 'lround' but I think there is less baggage for the new user 
to think about if the name is something like nice()--a function that makes the 
floating point numbers "play nice." And I also think the name...sounds 
nice.) 
 
Here it is in 
action:
 
###
>>> 
3*1.1==3.3False>>> 
nice(3*1.1)==nice(3.3)True>>> x=3.21/0.65; print 
x4.93846153846
>>> print 
nice(x,2)
4.9>>> x=x*1e5; 
print nice(x,2)49.0###
 
Here's the function: 


###
def 
nice(x,leadingDigits=0): """Return x either as 'print' would show it 
(the default) or rounded to the specified digit as counted from the 
leftmost non-zero digit of the number,
 
 e.g. nice(0.00326,2) 
--> 0.0033""" assert leadingDigits>=0 if 
leadingDigits==0:  return float(str(x)) #just give it back like 
'print' would give it leadingDigits=int(leadingDigits) return 
float('%.*e' % (leadingDigits,x)) #give it back as rounded by the %e 
format###
 
Might something like this be useful? For new 
users, no arguments are needed other than x and floating points suddenly seem to 
behave in tests made using nice() values. It's also useful for those computing 
who want to show a physically meaningful value that has been rounded to the 
appropriate digit as counted from the most significant digit rather than from 
the decimal point. 
 
Some time back I had worked on the 
significant digit problem and had several math calls to figure out what the 
exponent was. The beauty of Tim's solution is that you just use built in string 
formatting to do the work. Nice.
 
/c
 
[1] http://mail.python.org/pipermail/tutor/2004-July/030324.html
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Surely "nullable" is a reasonable name?

2014-08-04 Thread Nathaniel Smith
I admit I spent the first half of the email scratching my head and trying
to figure out what NULL had to do with argument clinic specs. (Maybe it
would mean that if the argument is "not given" in some appropriate way then
we set the corresponding C variable to NULL?) Finding out you were talking
about None came as a surprising twist.

-n
On 4 Aug 2014 08:13, "Larry Hastings"  wrote:

>
>
> Argument Clinic "converters" specify how to convert an individual argument
> to the function you're defining.  Although a converter could theoretically
> represent any sort of conversion, most of the time they directly represent
> types like "int" or "double" or "str".
>
> Because there's such variety in argument parsing, the converters are
> customizable with parameters.  Many of these are common enough that
> Argument Clinic suggests some standard names.  Examples: "zeroes=True" for
> strings and buffers means "permit internal \0 characters", and
> "bitwise=True" for unsigned integers means "copy the bits over, even if
> there's overflow/underflow, and even if the original is negative".
>
> A third example is "nullable=True", which means "also accept None for this
> parameter".  This was originally intended for use with strings (compare the
> "s" and "z" format units for PyArg_ParseTuple), however it looks like we'll
> have a use for "nullable ints" in the ongoing Argument Clinic conversion
> work.
>
> Several people have said they found the name "nullable" surprising,
> suggesting I use another name like "allow_none" or "noneable".  I, in turn,
> find their surprise surprising; "nullable" is a term long associated with
> exactly this concept.  It's used in C# and SQL, and the term even has its
> own Wikipedia page:
>
> http://en.wikipedia.org/wiki/Nullable_type
>
> Most amusingly, Vala *used* to have an annotation called "(allow-none)",
> but they've broken it out into two annotations, "(nullable)" and
> "(optional)".
>
>
> http://blogs.gnome.org/desrt/2014/05/27/allow-none-is-dead-long-live-nullable/
>
>
> Before you say "the term 'nullable' will confuse end users", let me remind
> you: this is not user-facing.  This is a parameter for an Argument Clinic
> converter, and will only ever be seen by CPython core developers.  A group
> which I hope is not so easily confused.
>
> It's my contention that "nullable" is the correct name.  But I've been
> asked to bring up the topic for discussion, to see if a consensus forms
> around this or around some other name.
>
> Let the bike-shedding begin,
>
>
> */arry*
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] performance delta with .py presence v.s. only pyc on python 2.7.x?

2014-10-07 Thread John Smith
My team has a python app that appears to perform 10+% better when the
python source is present v.s. when only the pyc's are distributed.  This is
contradictory to my understanding of how python leverages pyc's.

Scenario

1. pyc-only install sees mediocre performance. (pyc's are built using
compileall.py, then source .py's removed before packaging)
2. adding the .py files to the installed filesystem results in 10+%
performance improvement
3. verified no difference in mod times, no new .pyc's generated
4. remove the .py's and again see drop in performance


Env info: centos 6.5, x86_64, however python used is 32bit from the i686
repos (this is how we deal with some 32bit only c libraries that the app
must use)


Has anyone got an idea of what I might be running into?  The app itself is
a black box to me.  I was not able to isolate this using a simple piece of
python code that worked on a math problem for a discernible amount of time,
so it could be something to do with how the app is using python.  The app
was originally written for python 2.4...

Any suggestions on what I can try is appreciated.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-09 Thread Nathaniel Smith
On Fri, Oct 10, 2014 at 1:29 AM, Victor Stinner
 wrote:
> Hi,
>
> Windows is not the primary target of Python developers, probably
> because most of them work on Linux. Official Python binaries are
> currently built by Microsoft Visual Studio. Even if Python developers
> get free licenses thanks for Microsoft, I would prefer to use an open
> source compiler if it would be possible. So *anyone* can build Python
> from scatch. I don't like the requirement of having a license to build
> Python. The free version (Visual Studio Express) only supports 32-bit
> and doesn't support PGO build (Profile-Guided Optimizations, which are
> disabled if I remember correctly because of compiler bugs).
>
> I know that it's hard to replace Visual Studio. I don't want to do it
> right now, but I would like to discuss that with you.
>
>
> === Open Watcom
>
> Jeffrey Armstrong is working on the Python support of OpenWatcom(v2), see:
> http://lightningpython.org/
> https://bitbucket.org/ArmstrongJ/lightning-python
>
> This compiler was initially written on MS-DOS in 32-bit, but it now
> supports Windows and Linux as well. The 64-bit mode is new and
> experimental. The Open Watcom "v2" project is actively developed at:
>
> https://github.com/open-watcom/open-watcom-v2/
>
> On Linux, Open Watcom don't support dynamic linking. On Windows, it
> uses its own C library. I'm not sure that Open Watcom is the best
> choice to build Python on Windows.
>
>
> === MinGW
>
> Some people tried to compile Python. See for example:
> https://bitbucket.org/puqing/python-mingw
>
> We even got some patches:
> http://bugs.python.org/issue3871 (rejected)
>
> See also:
> https://stackoverflow.com/questions/15365249/build-python-with-mingw-and-gcc
>
> MinGW reuses the Microsoft C library and it is based on GCC which is
> very stable, actively developed, supports a lot of archiectures, etc.
> I guess that it should be possible to reuse third party GCC tools like
> the famous GDB debugger?

You may want to get in touch with Carl Kleffner -- he's done a bunch
of work lately on getting a mingw-based toolchain to the point where
it can build numpy and scipy. (This is pretty urgent for us because
(a) numerical work requires a BLAS library and the main competitive
open-source one -- OpenBLAS -- cannot be built by msvc because of asm
syntax issues, (b) msvc's fortran support is even worse than its C99
support.) Getting this working is non-trivial, since by default
mingw-compiled code depends on the GCC runtime libraries, the default
ABI doesn't match msvc, etc. But apparently these issues are all
fixable.

General info:
  https://github.com/numpy/numpy/wiki/Mingw-static-toolchain

The built toolchains etc.:
  https://bitbucket.org/carlkl/mingw-w64-for-python/downloads

Readme:
  https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/readme.txt

The patch to the numpy sources -- this in particular includes the
various distutils hacks needed to enable the crucial ABI-compatibility
switches:
  https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/numpy.patch

(Unfortunately he doesn't seem to have posted the build recipe for the
toolchain itself -- I'm sure he'd be happy to if you asked though.)

AFAICT the end result is a single free compiler toolchain that can
spit out 32- and 64-bit binaries using whichever MSVC runtime you
prefer.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-11 Thread Nathaniel Smith
On 11 Oct 2014 14:42, "Antoine Pitrou"  wrote:
>
> On Sat, 11 Oct 2014 00:30:51 + (UTC)
> Sturla Molden  wrote:
> > Larry Hastings  wrote:
> >
> > > CPython doesn't require OpenBLAS.  Not that I am not receptive to the
> > > needs of the numeric community... but, on the other hand, who in the
> > > hell releases a library with Windows support that doesn't work with
MSVC?!
> >
> > It uses AT&T assembly syntax instead of Intel assembly syntax.
>
> But you can compile OpenBLAS with one compiler and then link it to
> Python using another compiler, right? There is a single C ABI.

In theory, yes, but this is pretty misleading. The theory has been known
for years. In practice we've only managed to pull this off for the first
time within the last few months, and it requires one specific build of one
specific mingw fork with one specific set of build options, and only one
person understands the details. Hopefully all those things will continue to
be available and there aren't any showstopper bugs we haven't noticed yet...

(Before that we've spent the last 5+ years using a carefully preserved
build of an ancient 32 bit mingw and substandard BLAS .dll that were handed
down from our ancestors; we've had no capability to produce 64 bit official
builds at all. And a large proportion of users have been using 3rd party
proprietary builds.)

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-11 Thread Nathaniel Smith
I'm not at all an expert on Fortran ABIs, but I think there are two
distinct issues being conflated here.

The first is that there is no standard way to look at some Fortran source
code and figure out the corresponding C API. When trying to call a Fortran
routine from C, then different Fortran compilers require different sorts of
name mangling, different ways of mapping Fortran concepts like "output
arguments" onto C concepts like "pointers", etc., so your C code needs to
have explicit knowledge of which Fortran compiler is in use. That's what
Sturla was referring to with the differences between g77 versus gfortran
etc. This is all very annoying, but it isn't a *deep* bad - it can be
solved by unilateral action by whatever project wants to link the Fortran
library.

The bigger problem is that getting a usable DLL at all is a serious
challenge. Some of the issues we deal with: (a) the classic, stable mingw
has no 64-bit support, (b) the only portable way to compile fortran (f2c)
only works for the ancient fortran 77, (c) getting even mingw-w64 to use a
msvc-compatible ABI is not trivial (I have no idea why this is the case,
but it is), (d)
mingw-built
dlls normally depend on the mingw runtime dlls. Because these aren't
shipped globally with Python, they have to be either linked statically or
else a separate copy of them has to be placed into every directory that
contains any mingw-compiled extension module.

All the runtime and ABI issues do mean that it would be much easier to use
mingw(-w64) to build extension modules if Python itself were built with
mingw(-w64). Obviously this would in turn make it harder to build
extensions with MSVC, though, which would be a huge transition. I don't
know whether gcc's advantages (support for more modern C, better
cross-platform compatibility, better accessibility to non-windows-experts,
etc.) would outweigh the transition and other costs.

As an intermediate step, there are almost certainly things that could be
done to make it easier to use mingw-w64 to build python extensions, e.g.
teaching setuptools about how to handle the ABI issues. Maybe it would even
be possible to ship the mingw runtimes in some globally available location.

-n
On 11 Oct 2014 17:07, "Steve Dower"  wrote:

>   Is there some reason the Fortran part can't be separated out into a
> DLL? That's the C ABI Antoine was referring to, and most compilers can
> generate import libraries from binaries, even if the original compiler
> produced then in a different format.
>
> Top-posted from my Windows Phone
>  --
> From: Sturla Molden 
> Sent: ‎10/‎11/‎2014 7:22
> To: python-dev@python.org
> Subject: Re: [Python-Dev] Status of C compilers for Python on Windows
>
>   Antoine Pitrou  wrote:
>
> > It sound like whatever MSVC produces should be the defacto standard
> > under Windows.
>
> Yes, and that is what Clang does on Windows. It is not as usable as MinGW
> yet, but soon it will be. Clang also suffers fronthe lack of a Fortran
> compiler, though.
>
> Sturla
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/steve.dower%40microsoft.com
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-27 Thread Nathaniel Smith
On Mon, Oct 27, 2014 at 5:48 PM, Paul Moore  wrote:
> On 26 October 2014 23:44, Paul Moore  wrote:
>> On 26 October 2014 23:11, Ray Donnelly  wrote:
>>> I don't know where this "ABI compatible" thing came into being;
>>
>> Simple. If a mingw-built CPython doesn't work with the same extensions
>> as a MSVC-built CPython, then the community gets fragmented (because
>> you can only use the extensions built for your stack). Assuming numpy
>> needs mingw and ultimately only gets built for a mingw-compiled Python
>> (because the issues building for MSVC-built Python are too hard) and
>> assuming that nobody wants to make the effort to build pywin32 under
>> mingw, then what does someone who needs both numpy and pywin32 do?
>>
>> Avoiding that issue is what I mean by ABI-compatible. (And that's all
>> I mean by it, nothing more subtle or controversial).
>>
>> I view it as critical (because availability of binaries is *already*
>> enough of a problem in the Windows world, without making it worse)
>> that we avoid this sort of fragmentation. I'm not seeing an
>> acknowledgement from the mingw side that they agree. That's my
>> concern. If we both agree, there's nothing to argue about.
>
> I have just done some experiments with building CPython extensions
> with mingw-w64. Thanks to Ray for helping me set this up.
>
> The bad news is that the support added to the old 32-bit mingw to
> support linking to alternative C runtime libraries (specifically
> -lmsvcr100) has bitrotted, and no longer functions correctly in
> mingw-w64. As a result, not only can mingw-w64 not build extensions
> that are compatible with python.org Python, it can't build extensions
> that function at all [1]. They link incompatibly to *both* msvcrt and
> msvcr100.
>
> This is a bug in mingw-w64. I have reported it to Ray, who's passed it
> onto one of the mingw-w64 developers. But as things stand, mingw
> builds will definitely produce binary extensions that aren't
> compatible with python.org Python.

IIUC, getting mingw-w64 to link against msvcr100 instead of msvcrt
requires a custom mingw-w64 build, because by default mingw-w64's
internal runtime libraries (libgcc etc.) are linked against msvcrt. So
by the time you're choosing compiler switches etc., it's already too
late -- your switches might affect how *your* code is built, but your
code will still be linked against pre-existing runtime libraries that
are linked against msvcrt.

It's possible to hack the mingw-w64 build process to build the runtime
libraries against msvcr100 (or whatever) instead of msvcrt, but this
is still not a panacea -- the different msvcr* libraries are, of
course, incompatible with each other, and IIUC the mingw-w64
developers have never tried to make their libraries work against
anything except msvcrt. For example, mingw-w64's gfortran runtime uses
a symbol that's only available in msvcrt, not msvcr90 or msvcrt100:
  http://sourceforge.net/p/mingw-w64/mailman/message/31768118/

So my impression is that these issues are all fixable, but they will
require real engagement with mingw-w64 upstream.

> [1] Note, that's if you just use --compiler=mingw32 as supported by
> distutils. Looking at how the numpy folks build, they seem to hack
> their own version of the distutils C compiler classes. I don't know
> whether that's just to work around this bug, or whether they do it for
> other reasons as well (but I suspect the latter).

numpy.distutils is a massive pile of hacks to handle all kinds of
weird things including recursive builds, fortran, runtime capability
detection (like autoconf), and every random issue anyone ran into at
some point in the last 10 years and couldn't be bothered filing a
proper upstream bug report. Basically no-one knows what it actually
does -- the source is your only hope :-).

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-29 Thread Nathaniel Smith
On 29 Oct 2014 14:47, "Antoine Pitrou"  wrote:
>
> On Wed, 29 Oct 2014 10:31:50 -0400
> "R. David Murray"  wrote:
>
> > On Wed, 29 Oct 2014 10:22:14 -0400, Tres Seaver 
wrote:
> > > On 10/28/2014 11:59 PM, Stephen J. Turnbull wrote:
> > >
> > > > most developers on Windows do have access to Microsoft tool
> > >
> > > I assume you mean python-dev folks who work on Windows:  it is
certainly
> > > not true for the vast majority of develoeprs who use Python on
Windows,
> > > who don't have the toolchain build their own C extensions.
> >
> > I'm pretty sure he meant "most people who develop software for Windows",
> > even though that's not how he phrased it.  But this does not include, as
> > you point out, people who develop Python software that *also* works on
> > Windows.
> >
> > If you are writing code targeted for Windows, I think you are very
> > likely to have an MSDN subscription of some sort if your package
includes C
> > code.  I'm sure it's not 100%, though.
>
> You can use Express editions of Visual Studio.

IIUC, the express edition compilers are 32-bit only, and what you actually
want are the "SDK compilers":
https://github.com/cython/cython/wiki/64BitCythonExtensionsOnWindows

These are freely downloadable by anyone, no msdn subscription required, but
only if you know where to find them!

AFAICT the main obstacle to using MSVC to build python extensions (assuming
it can handle your code at all) is not anything technical, but rather that
there's no clear and correct tutorial on how to do it, and lots of
confusion and misinformation circulating.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of C compilers for Python on Windows

2014-10-29 Thread Nathaniel Smith
On Wed, Oct 29, 2014 at 10:46 PM, Paul Moore  wrote:
> On 29 October 2014 22:19, Ethan Furman  wrote:
>>> Yeah, I wondered about that. I'll work up a patch for that. But the
>>> more I think about it, it really is trivial:
>>
>> I am reminded of an interview question I was once asked which was prefaced
>> with: "Here's an easy one..."
>>
>> My reply was, if you know the answer, it's easy!
>
> Yeah, I know what you mean. My take on this is that I agree it's not
> easy if you don't know and can't get access to the information, but if
> you can, there's very little to it.

That's great, but yeah. In case it helps as a data point, I consider
myself a reasonably technical guy, probably more than your average
python package author -- 15 years of free software hacking, designed a
pretty popular X remote display protocol, helped invent DVCS, have
written patches to gcc and the kernel, numpy co-maintainer, etc. etc.

For the last ~12 months I've had an underlined and circled todo item
saying "make wheels for https://pypi.python.org/pypi/zs/";, and every
time I've sat down and spent a few hours trying I've ended up utterly
defeated with a headache. Distutils is spaghetti, and Windows is
spaghetti (esp. if you've never used it for more than checking email
on a friend's laptop), and the toolchain setup is spaghetti, and then
suddenly people are talking about vcvarsall.bats and I don't know what
all. I honestly would totally benefit from one of those
talk-your-grandparents-through-it tutorials ("here's a screenshot of
the dialogue box, and I've drawn an arrow pointing at the 'ok' button.
You should press the 'ok' button.")

>>> - For non-free MSVC, install the appropriate version, and everything just
>>> works.
>>> - For Python 2.7 (32 or 64 bit), install the compiler for Python 2.7
>>> package and everything just works as long as you're using setuptools.
>>> - For 32 bit Python 3.2-3.4, install Visual Studio Express and
>>> everything just works.
>>> - For 64 bit Python 3.2-3.4, install the SDK, set some environment
>>> variables, and everything just works.

I think the SDK covers both 32 and 64 bit? If so then leaving visual
studio out of things entirely is probably simpler.

I'm also unsure how you even get both 32- and 64-bit versions of
python to coexist on the same system.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dinamically set __call__ method

2014-11-04 Thread Nathaniel Smith
On Tue, Nov 4, 2014 at 4:52 PM, Roberto Martínez
 wrote:
> Hi folks,
>
> I am trying to replace dinamically the __call__ method of an object using
> setattr.
> Example:
>
> $ cat testcall.py
> class A:
> def __init__(self):
> setattr(self, '__call__', self.newcall)
>
> def __call__(self):
> print("OLD")
>
> def newcall(self):
> print("NEW")
>
> a=A()
> a()
>
> I expect to get "NEW" instead of "OLD", but in Python 3.4 I get "OLD".
>
> $ python2.7 testcall.py
> NEW
> $ python3.4 testcall.py
> OLD
>
> I have a few questions:
>
> - Is this an expected behavior?
> - Is possible to replace __call__ dinamically in Python 3? How?

For new-style classes, special methods like __call__ are looked up
directly on the class, not on the object itself. If you want to change
the result of doing a(), then you need to reassign A.__call__, not
a.__call__.

In python 3, all classes are new-style classes. In python 2, only
classes that inherit from 'object' are new-style classes. (If you
replace 'class A:' with 'class A(object):' then you'll see the same
behaviour on both py2 and py3.)

Easy workaround:

def __call__(self, *args, **kwargs):
return self._my_call(*args, **kwargs)

Now you can assign a._my_call to be whatever you want.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] OneGet provider for Python

2014-11-15 Thread Nathaniel Smith
On 15 Nov 2014 10:10, "Paul Moore"  wrote:
>
> > Incidentally, it would be really useful if python.org provided stable
> > url's that always redirected to the latest .msi installers, for
> > bootstrapping purposes. I'd prefer to not rely on chocolatey (or on
> > scraping the web site) for this.
>
> https://www.python.org/ftp/python/$ver/python-$ver.msi
> https://www.python.org/ftp/python/$ver/python-$ver.amd64.msi

Right, but what's the URL for "the latest 2.7.x release" or "the latest
3.x.x release"?

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-28 Thread Nathaniel Smith
Hi all,

There was some discussion on python-ideas last month about how to make
it easier/more reliable for a module to override attribute access.
This is useful for things like autoloading submodules (accessing
'foo.bar' triggers the import of 'bar'), or for deprecating module
attributes that aren't functions. (Accessing 'foo.bar' emits a
DeprecationWarning, "the bar attribute will be removed soon".) Python
has had some basic support for this for a long time -- if a module
overwrites its entry in sys.modules[__name__], then the object that's
placed there will be returned by 'import'. This allows one to define
custom subclasses of module and use them instead of the default,
similar to how metaclasses allow one to use custom subclasses of
'type'.

In practice though it's very difficult to make this work safely and
correctly for a top-level package. The main problem is that when you
create a new object to stick into sys.modules, this necessarily means
creating a new namespace dict. And now you have a mess, because now
you have two dicts: new_module.__dict__ which is the namespace you
export, and old_module.__dict__, which is the globals() for the code
that's trying to define the module namespace. Keeping these in sync is
extremely error-prone -- consider what happens, e.g., when your
package __init__.py wants to import submodules which then recursively
import the top-level package -- so it's difficult to justify for the
kind of large packages that might be worried about deprecating entries
in their top-level namespace. So what we'd really like is a way to
somehow end up with an object that (a) has the same __dict__ as the
original module, but (b) is of our own custom module subclass. If we
can do this then metamodules will become safe and easy to write
correctly.

(There's a little demo of working metamodules here:
   https://github.com/njsmith/metamodule/
but it uses ctypes hacks that depend on non-stable parts of the
CPython ABI, so it's not a long-term solution.)

I've now spent some time trying to hack this capability into CPython
and I've made a list of the possible options I can think of to fix
this. I'm writing to python-dev because none of them are obviously The
Right Way so I'd like to get some opinions/ruling/whatever on which
approach to follow up on.

Option 1: Make it possible to change the type of a module object
in-place, so that we can write something like

   sys.modules[__name__].__class__ = MyModuleSubclass

Option 1 downside: The invariants required to make __class__
assignment safe are complicated, and only implemented for
heap-allocated type objects. PyModule_Type is not heap-allocated, so
making this work would require lots of delicate surgery to
typeobject.c. I'd rather not go down that rabbit-hole.



Option 2: Make PyModule_Type into a heap type allocated at interpreter
startup, so that the above just works.

Option 2 downside: PyModule_Type is exposed as a statically-allocated
global symbol, so doing this would involve breaking the stable ABI.



Option 3: Make it legal to assign to the __dict__ attribute of a
module object, so that we can write something like

   new_module = MyModuleSubclass(...)
   new_module.__dict__ = sys.modules[__name__].__dict__
   sys.modules[__name__].__dict__ = {} # ***
   sys.modules[__name__] = new_module

The line marked *** is necessary because the way modules are designed,
they expect to control the lifecycle of their __dict__. When the
module object is initialized, it fills in a bunch of stuff in the
__dict__. When the module object (not the dict object!) is
deallocated, it deletes everything from the __dict__. This latter
feature in particular means that having two module objects sharing the
same __dict__ is bad news.

Option 3 downside: The paragraph above. Also, there's stuff inside the
module struct besides just the __dict__, and more stuff has appeared
there over time.



Option 4: Add a new function sys.swap_module_internals, which takes
two module objects and swaps their __dict__ and other attributes. By
making the operation a swap instead of an assignment, we avoid the
lifecycle pitfalls from Option 3. By making it a builtin, we can make
sure it always handles all the module fields that matter, not just
__dict__. Usage:

   new_module = MyModuleSubclass(...)
   sys.swap_module_internals(new_module, sys.modules[__name__])
   sys.modules[__name__] = new_module

Option 4 downside: Obviously a hack.



Option 3 or 4 both seem workable, it just depends on which way we
prefer to hold our nose. Option 4 is slightly more correct in that it
works for *all* modules, but OTOH at the moment the only time Option 3
*really* fails is for compiled modules with PEP 3121 metadata, and
compiled modules can already use a module subclass via other means
(since they instantiate their own module objects).

Tho

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-29 Thread Nathaniel Smith
On Sat, Nov 29, 2014 at 4:21 AM, Guido van Rossum  wrote:
> Are these really all our options? All of them sound like hacks, none of them
> sound like anything the language (or even the CPython implementation) should
> sanction. Have I missed the discussion where the use cases and constraints
> were analyzed and all other approaches were rejected? (I might have some
> half-baked ideas, but I feel I should read up on the past discussion first,
> and they are probably more fit for python-ideas than for python-dev. Plus
> I'm just writing this email because I'm procrastinating on the type hinting
> PEP. :-)

The previous discussions I was referring to are here:
  http://thread.gmane.org/gmane.comp.python.ideas/29487/focus=29555
  http://thread.gmane.org/gmane.comp.python.ideas/29788

There might well be other options; these are just the best ones I
could think of :-). The constraints are pretty tight, though:
- The "new module" object (whatever it is) should have a __dict__ that
aliases the original module globals(). I can elaborate on this if my
original email wasn't enough, but hopefully it's obvious that making
two copies of the same namespace and then trying to keep them in sync
at the very least smells bad :-).
- The "new module" object has to be a subtype of ModuleType, b/c there
are lots of places that do isinstance(x, ModuleType) checks (notably
-- but not only -- reload()). Since a major goal here is to make it
possible to do cleaner deprecations, it would be really unfortunate if
switching an existing package to use the metamodule support itself
broke things :-).
- Lookups in the normal case should have no additional performance
overhead, because module lookups are extremely extremely common. (So
this rules out dict proxies and tricks like that -- we really need
'new_module.__dict__ is globals()' to be true.)

AFAICT there are three logically possible strategies for satisfying
that first constraint:
(a) convert the original module object into the type we want, in-place
(b) create a new module object that acts like the original module object
(c) somehow arrange for our special type to be used from the start

My options 1 and 2 are means of accomplishing (a), and my options 3
and 4 are means of accomplishing (b) while working around the
behavioural quirks of module objects (as required by the second
constraint).

The python-ideas thread did also consider several methods of
implementing strategy (c), but they're messy enough that I left them
out here. The problem is that somehow we have to execute code to
create the new subtype *before* we have an entry in sys.modules for
the package that contains the code for the subtype. So one option
would be to add a new rule, that if a file pkgname/__new__.py exists,
then this is executed first and is required to set up
sys.modules["pkgname"] before we exec pkgname/__init__.py. So
pkgname/__new__.py might look like:

import sys
from pkgname._metamodule import MyModuleSubtype
sys.modules[__name__] = MyModuleSubtype(__name__, docstring)

This runs into a lot of problems though. To start with, the 'from
pkgname._metamodule ...' line is an infinite loop, b/c this is the
code used to create sys.modules["pkgname"]. It's not clear where the
globals dict for executing __new__.py comes from (who defines
__name__? Currently that's done by ModuleType.__init__). It only works
for packages, not modules. The need to provide the docstring here,
before __init__.py is even read, is weird. It adds extra stat() calls
to every package lookup. And, the biggest showstopper IMHO: AFAICT
it's impossible to write a polyfill to support this code on old python
versions, so it's useless to any package which needs to keep
compatibility with 2.7 (or even 3.4). Sure, you can backport the whole
import system like importlib2, but telling everyone that they need to
replace every 'import numpy' with 'import importlib2; import numpy' is
a total non-starter.

So, yeah, those 4 options are really the only plausible ones I know of.

Option 1 and option 3 are pretty nice at the language level! Most
Python objects allow assignment to __class__ and __dict__, and both
PyPy and Jython at least do support __class__ assignment. Really the
only downside with Option 1 is that actually implementing it requires
attention from someone with deep knowledge of typeobject.c.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-29 Thread Nathaniel Smith
On Sat, Nov 29, 2014 at 11:32 AM, Antoine Pitrou  wrote:
> On Sat, 29 Nov 2014 01:59:06 +
> Nathaniel Smith  wrote:
>>
>> Option 1: Make it possible to change the type of a module object
>> in-place, so that we can write something like
>>
>>sys.modules[__name__].__class__ = MyModuleSubclass
>>
>> Option 1 downside: The invariants required to make __class__
>> assignment safe are complicated, and only implemented for
>> heap-allocated type objects. PyModule_Type is not heap-allocated, so
>> making this work would require lots of delicate surgery to
>> typeobject.c. I'd rather not go down that rabbit-hole.
>
> Option 1b: have __class__ assignment delegate to a tp_classassign slot
> on the old class, so that typeobject.c doesn't have to be cluttered with
> many special cases.

I'm intrigued -- how would this help?

I have a vague impression that one could add another branch to
object_set_class that went something like

if at least one of the types is a subtype of the other type, and the
subtype is a heap type with tp_dealloc == subtype_dealloc, and the
subtype doesn't add any important slots, and ... then the __class__
assignment is legal.

(This is taking advantage of the fact that if you don't have any extra
slots added, then subtype_dealloc just basically defers to the base
type's tp_dealloc, so it doesn't really matter which one you end up
calling.)

And my vague impression is that there isn't really anything special
about the module type that would allow a tp_classassign function to
simplify this logic.

But these are just vague impressions :-)

>> Option 3: Make it legal to assign to the __dict__ attribute of a
>> module object, so that we can write something like
>>
>>new_module = MyModuleSubclass(...)
>>new_module.__dict__ = sys.modules[__name__].__dict__
>>sys.modules[__name__].__dict__ = {} # ***
>>sys.modules[__name__] = new_module
>>
> [...]
>>
>> Option 4: Add a new function sys.swap_module_internals, which takes
>> two module objects and swaps their __dict__ and other attributes. By
>> making the operation a swap instead of an assignment, we avoid the
>> lifecycle pitfalls from Option 3. By making it a builtin, we can make
>> sure it always handles all the module fields that matter, not just
>> __dict__. Usage:
>
> How do these two options interact with the fact that module functions
> store their globals dict, not the module itself?

I think that's totally fine? The whole point of all these proposals is
to make sure that the final module object does in fact have the
correct globals dict.

~$ git clone g...@github.com:njsmith/metamodule.git
~$ cd metamodule
~/metamodule$ python3.4
>>> import examplepkg
>>> examplepkg

>>> examplepkg.f.__globals__ is examplepkg.__dict__
True

If anything this is another argument for why we NEED something like this :-).

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-30 Thread Nathaniel Smith
On Sun, Nov 30, 2014 at 2:54 AM, Guido van Rossum  wrote:
> All the use cases seem to be about adding some kind of getattr hook to
> modules. They all seem to involve modifying the CPython C code anyway. So
> why not tackle that problem head-on and modify module_getattro() to look for
> a global named __getattr__ and if it exists, call that instead of raising
> AttributeError?

You need to allow overriding __dir__ as well for tab-completion, and
some people wanted to use the properties API instead of raw
__getattr__, etc. Maybe someone will want __getattribute__ semantics,
I dunno. So since we're *so close* to being able to just use the
subclassing machinery, it seemed cleaner to try and get that working
instead of reimplementing bits of it piecewise.

That said, __getattr__ + __dir__ would be enough for my immediate use cases.

-n

> On Sat, Nov 29, 2014 at 11:37 AM, Nathaniel Smith  wrote:
>>
>> On Sat, Nov 29, 2014 at 4:21 AM, Guido van Rossum 
>> wrote:
>> > Are these really all our options? All of them sound like hacks, none of
>> > them
>> > sound like anything the language (or even the CPython implementation)
>> > should
>> > sanction. Have I missed the discussion where the use cases and
>> > constraints
>> > were analyzed and all other approaches were rejected? (I might have some
>> > half-baked ideas, but I feel I should read up on the past discussion
>> > first,
>> > and they are probably more fit for python-ideas than for python-dev.
>> > Plus
>> > I'm just writing this email because I'm procrastinating on the type
>> > hinting
>> > PEP. :-)
>>
>> The previous discussions I was referring to are here:
>>   http://thread.gmane.org/gmane.comp.python.ideas/29487/focus=29555
>>   http://thread.gmane.org/gmane.comp.python.ideas/29788
>>
>> There might well be other options; these are just the best ones I
>> could think of :-). The constraints are pretty tight, though:
>> - The "new module" object (whatever it is) should have a __dict__ that
>> aliases the original module globals(). I can elaborate on this if my
>> original email wasn't enough, but hopefully it's obvious that making
>> two copies of the same namespace and then trying to keep them in sync
>> at the very least smells bad :-).
>> - The "new module" object has to be a subtype of ModuleType, b/c there
>> are lots of places that do isinstance(x, ModuleType) checks (notably
>> -- but not only -- reload()). Since a major goal here is to make it
>> possible to do cleaner deprecations, it would be really unfortunate if
>> switching an existing package to use the metamodule support itself
>> broke things :-).
>> - Lookups in the normal case should have no additional performance
>> overhead, because module lookups are extremely extremely common. (So
>> this rules out dict proxies and tricks like that -- we really need
>> 'new_module.__dict__ is globals()' to be true.)
>>
>> AFAICT there are three logically possible strategies for satisfying
>> that first constraint:
>> (a) convert the original module object into the type we want, in-place
>> (b) create a new module object that acts like the original module object
>> (c) somehow arrange for our special type to be used from the start
>>
>> My options 1 and 2 are means of accomplishing (a), and my options 3
>> and 4 are means of accomplishing (b) while working around the
>> behavioural quirks of module objects (as required by the second
>> constraint).
>>
>> The python-ideas thread did also consider several methods of
>> implementing strategy (c), but they're messy enough that I left them
>> out here. The problem is that somehow we have to execute code to
>> create the new subtype *before* we have an entry in sys.modules for
>> the package that contains the code for the subtype. So one option
>> would be to add a new rule, that if a file pkgname/__new__.py exists,
>> then this is executed first and is required to set up
>> sys.modules["pkgname"] before we exec pkgname/__init__.py. So
>> pkgname/__new__.py might look like:
>>
>> import sys
>> from pkgname._metamodule import MyModuleSubtype
>> sys.modules[__name__] = MyModuleSubtype(__name__, docstring)
>>
>> This runs into a lot of problems though. To start with, the 'from
>> pkgname._metamodule ...' line is an infinite loop, b/c this is the
>> code used to create sys.modules["pkgname"]. It's not clear where the
>> globals dict for executing __new__.py comes from (who defines
&g

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-30 Thread Nathaniel Smith
On Sun, Nov 30, 2014 at 7:27 PM, Ethan Furman  wrote:
> On 11/30/2014 11:15 AM, Guido van Rossum wrote:
>> On Sun, Nov 30, 2014 at 6:15 AM, Brett Cannon wrote:
>>> On Sat, Nov 29, 2014, 21:55 Guido van Rossum wrote:
>>>>
>>>> All the use cases seem to be about adding some kind of getattr hook
>>>> to modules. They all seem to involve modifying the CPython C code
>>>> anyway. So why not tackle that problem head-on and modify module_getattro()
>>>> to look for a global named __getattr__ and if it exists, call that instead
>>>> of raising AttributeError?
>>>
>>> Not sure if anyone thought of it. :) Seems like a reasonable solution to me.
>>> Be curious to know what the benchmark suite said the impact was.
>>
>> Why would there be any impact? The __getattr__ hook would be similar to the
>> one on classes -- it's only invoked at the point where otherwise 
>> AttributeError
>> would be raised.
>
> I think the bigger question is how do we support it back on 2.7?

I think that's doable -- assuming I'm remembering correctly the
slightly weird class vs. instance lookup rules for special methods,
you can write a module subclass like

class GetAttrModule(types.ModuleType):
    def __getattr__(self, name):
return self.__dict__["__getattr__"](name)

and then use ctypes hacks to get it into sys.modules[__name__].

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-30 Thread Nathaniel Smith
On Sun, Nov 30, 2014 at 10:14 PM, Mark Shannon  wrote:
> Hi,
>
> This discussion has been going on for a while, but no one has questioned the
> basic premise. Does this needs any change to the language or interpreter?
>
> I believe it does not. I'm modified your original metamodule.py to not use
> ctypes and support reloading:
> https://gist.github.com/markshannon/1868e7e6115d70ce6e76

Interesting approach!

As written, your code will blow up on any python < 3.4, because when
old_module gets deallocated it'll wipe the module dict clean. And I
guess even on >=3.4, this might still happen if old_module somehow
manages to get itself into a reference loop before getting
deallocated. (Hopefully not, but what a nightmare to debug if it did.)
However, both of these issues can be fixed by stashing a reference to
old_module somewhere in new_module.

The __class__ = ModuleType trick is super-clever but makes me
irrationally uncomfortable. I know that this is documented as a valid
method of fooling isinstance(), but I didn't know that until your
yesterday, and the idea of objects where type(foo) is not
foo.__class__ strikes me as somewhat blasphemous. Maybe this is all
fine though.

The pseudo-module objects generated this way will still won't pass
PyModule_Check, so in theory this could produce behavioural
differences. I can't name any specific places where this will break
things, though. From a quick skim of the CPython source, a few
observations: It means the PyModule_* API functions won't work (e.g.
PyModule_GetDict); maybe these aren't used enough to matter. It looks
like the __reduce__ methods on "method objects"
(Objects/methodobject.c) have a special check for ->m_self being a
module object, and won't pickle correctly if ->m_self ends up pointing
to one of these pseudo-modules. I have no idea how one ends up with a
method whose ->m_self points to a module object, though -- maybe it
never actually happens. PyImport_Cleanup treats module objects
differently from non-module objects during shutdown.

I guess it also has the mild limitation that it doesn't work with
extension modules, but eh. Mostly I'd be nervous about the two points
above.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-30 Thread Nathaniel Smith
On Mon, Dec 1, 2014 at 12:59 AM, Nathaniel Smith  wrote:
> On Sun, Nov 30, 2014 at 10:14 PM, Mark Shannon  wrote:
>> Hi,
>>
>> This discussion has been going on for a while, but no one has questioned the
>> basic premise. Does this needs any change to the language or interpreter?
>>
>> I believe it does not. I'm modified your original metamodule.py to not use
>> ctypes and support reloading:
>> https://gist.github.com/markshannon/1868e7e6115d70ce6e76
>
> Interesting approach!
>
> As written, your code will blow up on any python < 3.4, because when
> old_module gets deallocated it'll wipe the module dict clean. And I
> guess even on >=3.4, this might still happen if old_module somehow
> manages to get itself into a reference loop before getting
> deallocated. (Hopefully not, but what a nightmare to debug if it did.)
> However, both of these issues can be fixed by stashing a reference to
> old_module somewhere in new_module.
>
> The __class__ = ModuleType trick is super-clever but makes me
> irrationally uncomfortable. I know that this is documented as a valid
> method of fooling isinstance(), but I didn't know that until your
> yesterday, and the idea of objects where type(foo) is not
> foo.__class__ strikes me as somewhat blasphemous. Maybe this is all
> fine though.
>
> The pseudo-module objects generated this way will still won't pass
> PyModule_Check, so in theory this could produce behavioural
> differences. I can't name any specific places where this will break
> things, though. From a quick skim of the CPython source, a few
> observations: It means the PyModule_* API functions won't work (e.g.
> PyModule_GetDict); maybe these aren't used enough to matter. It looks
> like the __reduce__ methods on "method objects"
> (Objects/methodobject.c) have a special check for ->m_self being a
> module object, and won't pickle correctly if ->m_self ends up pointing
> to one of these pseudo-modules. I have no idea how one ends up with a
> method whose ->m_self points to a module object, though -- maybe it
> never actually happens. PyImport_Cleanup treats module objects
> differently from non-module objects during shutdown.

Actually, there is one showstopper here -- in the first version where
reload() uses isinstance() is actually 3.4. Before that you need a
real module subtype for reload to work. But this is in principle
workaroundable by using subclassing + ctypes on old versions of python
and the __class__ = hack on new versions.

> I guess it also has the mild limitation that it doesn't work with
> extension modules, but eh. Mostly I'd be nervous about the two points
> above.
>
> -n
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org



-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-30 Thread Nathaniel Smith
On Sun, Nov 30, 2014 at 8:54 PM, Guido van Rossum  wrote:
> On Sun, Nov 30, 2014 at 11:29 AM, Nathaniel Smith  wrote:
>>
>> On Sun, Nov 30, 2014 at 2:54 AM, Guido van Rossum 
>> wrote:
>> > All the use cases seem to be about adding some kind of getattr hook to
>> > modules. They all seem to involve modifying the CPython C code anyway.
>> > So
>> > why not tackle that problem head-on and modify module_getattro() to look
>> > for
>> > a global named __getattr__ and if it exists, call that instead of
>> > raising
>> > AttributeError?
>>
>> You need to allow overriding __dir__ as well for tab-completion, and
>> some people wanted to use the properties API instead of raw
>> __getattr__, etc. Maybe someone will want __getattribute__ semantics,
>> I dunno.
>
> Hm... I agree about __dir__ but the other things feel too speculative.
>
>> So since we're *so close* to being able to just use the
>> subclassing machinery, it seemed cleaner to try and get that working
>> instead of reimplementing bits of it piecewise.
>
> That would really be option 1, right? It's the one that looks cleanest from
> the user's POV (or at least from the POV of a developer who wants to build a
> framework using this feature -- for a simple one-off use case, __getattr__
> sounds pretty attractive). I think that if we really want option 1, the
> issue of PyModuleType not being a heap type can be dealt with.

Options 1-4 all have the effect of making it fairly simple to slot an
arbitrary user-defined module subclass into sys.modules. Option 1 is
the cleanest API though :-).

>>
>> That said, __getattr__ + __dir__ would be enough for my immediate use
>> cases.
>
>
>  Perhaps it would be a good exercise to try and write the "lazy submodule
> import"(*) use case three ways: (a) using only CPython 3.4; (b) using
> __class__ assignment; (c) using customizable __getattr__ and __dir__. I
> think we can learn a lot about the alternatives from this exercise. I
> presume there's already a version of (a) floating around, but if it's been
> used in practice at all, it's probably too gnarly to serve as a useful
> comparison (though its essence may be extracted to serve as such).

(b) and (c) are very straightforward and trivial. Probably I could do
a better job of faking dir()'s default behaviour on modules, but
basically:

# __class__ assignment__ #

import sys, types, importlib

class MyModule(types.ModuleType):
def __getattr__(self, name):
if name in _lazy_submodules:
# implicitly assigns submodule to self.__dict__[name]
return importlib.import_module(name, package=self.__package__)

def __dir__(self):
entries = set(self.__dict__)
entries.update(__lazy_submodules__)
return sorted(entries)

sys.modules[__name__].__class__ = MyModule
_lazy_submodules = {"foo", "bar"}

# customizable __getattr__ and __dir__ #

import importlib

def __getattr__(name):
if name in _lazy_submodules:
# implicitly assigns submodule to globals()[name]
return importlib.import_module(name, package=self.__package__)

def __dir__():
entries = set(globals())
entries.update(__lazy_submodules__)
return sorted(entries)

_lazy_submodules = {"foo", "bar"}

> FWIW I believe all proposals here have a big limitation: the module *itself*
> cannot benefit much from all these shenanigans, because references to
> globals from within the module's own code are just dictionary accesses, and
> we don't want to change that.

I think that's fine -- IMHO the main uses cases here are about
controlling the public API. And a module that really wants to can
always import itself if it wants to pull more shenanigans :-) (i.e.,
foo/__init__.py can do "import foo; foo.blahblah" instead of just
"blahblah".)

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-11-30 Thread Nathaniel Smith
On Mon, Dec 1, 2014 at 1:27 AM, Guido van Rossum  wrote:
> Nathaniel, did you look at Brett's LazyLoader? It overcomes the subclass
> issue by using a module loader that makes all modules instances of a
> (trivial) Module subclass. I'm sure this approach can be backported as far
> as you need to go.

The problem is that by the time your package's code starts running,
it's too late to install such a loader. Brett's strategy works well
for lazy-loading submodules (e.g., making it so 'import numpy' makes
'numpy.testing' available, but without the speed hit of importing it
immediately), but it doesn't help if you want to actually hook
attribute access on your top-level package (e.g., making 'numpy.foo'
trigger a DeprecationWarning -- we have a lot of stupid exported
constants that we can never get rid of because our rules say that we
have to deprecate things before removing them).

Or maybe you're suggesting that we define a trivial heap-allocated
subclass of PyModule_Type and use that everywhere, as a
quick-and-dirty way to enable __class__ assignment? (E.g., return it
from PyModule_New?) I considered this before but hesitated b/c it
could potentially break backwards compatibility -- e.g. if code A
creates a PyModule_Type object directly without going through
PyModule_New, and then code B checks whether the resulting object is a
module by doing isinstance(x, type(sys)), this will break. (type(sys)
is a pretty common way to get a handle to ModuleType -- in fact both
types.py and importlib use it.) So in my mind I sorta lumped it in
with my Option 2, "minor compatibility break". OTOH maybe anyone who
creates a module object without going through PyModule_New deserves
whatever they get.

-n

> On Sun, Nov 30, 2014 at 5:02 PM, Nathaniel Smith  wrote:
>>
>> On Mon, Dec 1, 2014 at 12:59 AM, Nathaniel Smith  wrote:
>> > On Sun, Nov 30, 2014 at 10:14 PM, Mark Shannon  wrote:
>> >> Hi,
>> >>
>> >> This discussion has been going on for a while, but no one has
>> >> questioned the
>> >> basic premise. Does this needs any change to the language or
>> >> interpreter?
>> >>
>> >> I believe it does not. I'm modified your original metamodule.py to not
>> >> use
>> >> ctypes and support reloading:
>> >> https://gist.github.com/markshannon/1868e7e6115d70ce6e76
>> >
>> > Interesting approach!
>> >
>> > As written, your code will blow up on any python < 3.4, because when
>> > old_module gets deallocated it'll wipe the module dict clean. And I
>> > guess even on >=3.4, this might still happen if old_module somehow
>> > manages to get itself into a reference loop before getting
>> > deallocated. (Hopefully not, but what a nightmare to debug if it did.)
>> > However, both of these issues can be fixed by stashing a reference to
>> > old_module somewhere in new_module.
>> >
>> > The __class__ = ModuleType trick is super-clever but makes me
>> > irrationally uncomfortable. I know that this is documented as a valid
>> > method of fooling isinstance(), but I didn't know that until your
>> > yesterday, and the idea of objects where type(foo) is not
>> > foo.__class__ strikes me as somewhat blasphemous. Maybe this is all
>> > fine though.
>> >
>> > The pseudo-module objects generated this way will still won't pass
>> > PyModule_Check, so in theory this could produce behavioural
>> > differences. I can't name any specific places where this will break
>> > things, though. From a quick skim of the CPython source, a few
>> > observations: It means the PyModule_* API functions won't work (e.g.
>> > PyModule_GetDict); maybe these aren't used enough to matter. It looks
>> > like the __reduce__ methods on "method objects"
>> > (Objects/methodobject.c) have a special check for ->m_self being a
>> > module object, and won't pickle correctly if ->m_self ends up pointing
>> > to one of these pseudo-modules. I have no idea how one ends up with a
>> > method whose ->m_self points to a module object, though -- maybe it
>> > never actually happens. PyImport_Cleanup treats module objects
>> > differently from non-module objects during shutdown.
>>
>> Actually, there is one showstopper here -- in the first version where
>> reload() uses isinstance() is actually 3.4. Before that you need a
>> real module subtype for reload to work. But this is in principle
>> workaroundable by using subclassing + ctypes on old versions of python
>> and 

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-12-01 Thread Nathaniel Smith
On Mon, Dec 1, 2014 at 4:06 AM, Guido van Rossum  wrote:
> On Sun, Nov 30, 2014 at 5:42 PM, Nathaniel Smith  wrote:
>>
>> On Mon, Dec 1, 2014 at 1:27 AM, Guido van Rossum  wrote:
>> > Nathaniel, did you look at Brett's LazyLoader? It overcomes the subclass
>> > issue by using a module loader that makes all modules instances of a
>> > (trivial) Module subclass. I'm sure this approach can be backported as
>> > far
>> > as you need to go.
>>
>> The problem is that by the time your package's code starts running,
>> it's too late to install such a loader. Brett's strategy works well
>> for lazy-loading submodules (e.g., making it so 'import numpy' makes
>> 'numpy.testing' available, but without the speed hit of importing it
>> immediately), but it doesn't help if you want to actually hook
>> attribute access on your top-level package (e.g., making 'numpy.foo'
>> trigger a DeprecationWarning -- we have a lot of stupid exported
>> constants that we can never get rid of because our rules say that we
>> have to deprecate things before removing them).
>>
>> Or maybe you're suggesting that we define a trivial heap-allocated
>> subclass of PyModule_Type and use that everywhere, as a
>> quick-and-dirty way to enable __class__ assignment? (E.g., return it
>> from PyModule_New?) I considered this before but hesitated b/c it
>> could potentially break backwards compatibility -- e.g. if code A
>> creates a PyModule_Type object directly without going through
>> PyModule_New, and then code B checks whether the resulting object is a
>> module by doing isinstance(x, type(sys)), this will break. (type(sys)
>> is a pretty common way to get a handle to ModuleType -- in fact both
>> types.py and importlib use it.) So in my mind I sorta lumped it in
>> with my Option 2, "minor compatibility break". OTOH maybe anyone who
>> creates a module object without going through PyModule_New deserves
>> whatever they get.
>
>
> Couldn't you install a package loader using some install-time hook?
>
> Anyway, I still think that the issues with heap types can be overcome. Hm,
> didn't you bring that up before here? Was the conclusion that it's
> impossible?

I've brought it up several times but no-one's really discussed it :-).
I finally attempted a deep dive into typeobject.c today myself. I'm
not at all sure I understand the intricacies correctly here, but I
*think* __class__ assignment could be relatively easily extended to
handle non-heap types, and in fact the current restriction to heap
types is actually buggy (IIUC).

object_set_class is responsible for checking whether it's okay to take
an object of class "oldto" and convert it to an object of class
"newto". Basically it's goal is just to avoid crashing the interpreter
(as would quickly happen if you e.g. allowed "[].__class__ = dict").
Currently the rules (spread across object_set_class and
compatible_for_assignment) are:

(1) both oldto and newto have to be heap types
(2) they have to have the same tp_dealloc
(3) they have to have the same tp_free
(4) if you walk up the ->tp_base chain for both types until you find
the most-ancestral type that has a compatible struct layout (as
checked by equiv_structs), then either
   (4a) these ancestral types have to be the same, OR
   (4b) these ancestral types have to have the same tp_base, AND they
have to have added the same slots on top of that tp_base (e.g. if you
have class A(object): pass and class B(object): pass then they'll both
have added a __dict__ slot at the same point in the instance struct,
so that's fine; this is checked in same_slots_added).

The only place the code assumes that it is dealing with heap types is
in (4b) -- same_slots_added unconditionally casts the ancestral types
to (PyHeapTypeObject*). AFAICT that's why step (1) is there, to
protect this code. But I don't think the check actually works -- step
(1) checks that the types we're trying to assign are heap types, but
this is no guarantee that the *ancestral* types will be heap types.
[Also, the code for __bases__ assignment appears to also call into
this code with no heap type checks at all.] E.g., I think if you do

class MyList(list):
__slots__ = ()

class MyDict(dict):
__slots__ = ()

MyList().__class__ = MyDict()

then you'll end up in same_slots_added casting PyDict_Type and
PyList_Type to PyHeapTypeObjects and then following invalid pointers
into la-la land. (The __slots__ = () is to maintain layout
compatibility with the base types; if you find builtin types that
already have __dict__ and weaklist and HAVE_GC then this example
should still work even with p

Re: [Python-Dev] advice needed: best approach to enabling "metamodules"?

2014-12-02 Thread Nathaniel Smith
On Tue, Dec 2, 2014 at 9:19 AM, Antoine Pitrou  wrote:
> On Mon, 1 Dec 2014 21:38:45 +
> Nathaniel Smith  wrote:
>>
>> object_set_class is responsible for checking whether it's okay to take
>> an object of class "oldto" and convert it to an object of class
>> "newto". Basically it's goal is just to avoid crashing the interpreter
>> (as would quickly happen if you e.g. allowed "[].__class__ = dict").
>> Currently the rules (spread across object_set_class and
>> compatible_for_assignment) are:
>>
>> (1) both oldto and newto have to be heap types
>> (2) they have to have the same tp_dealloc
>> (3) they have to have the same tp_free
>> (4) if you walk up the ->tp_base chain for both types until you find
>> the most-ancestral type that has a compatible struct layout (as
>> checked by equiv_structs), then either
>>(4a) these ancestral types have to be the same, OR
>>(4b) these ancestral types have to have the same tp_base, AND they
>> have to have added the same slots on top of that tp_base (e.g. if you
>> have class A(object): pass and class B(object): pass then they'll both
>> have added a __dict__ slot at the same point in the instance struct,
>> so that's fine; this is checked in same_slots_added).
>>
>> The only place the code assumes that it is dealing with heap types is
>> in (4b)
>
> I'm not sure. Many operations are standardized on heap types that can
> have arbitrary definitions on static types (I'm talking about the tp_
> methods). You'd have to review them to double check.

Reading through the list of tp_ methods I can't see any other that
look problematic. The finalizers are kinda intimate, but I think
people would expect that if you swap an instance's type to something
that has a different __del__ method then it's the new __del__ method
that'll be called. If we wanted to be really careful we should perhaps
do something cleverer with tp_is_gc, but so long as type objects are
the only objects that have a non-trivial tp_is_gc, and the tp_is_gc
call depends only on their tp_flags (which are unmodified by __class__
assignment), then we should still be safe (and anyway this is
orthogonal to the current issues).

> For example, a heap type's tp_new increments the type's refcount, so
> you have to adjust the instance refcount if you cast it from a non-heap
> type to a heap type, and vice-versa (see slot_tp_new()).

Right, fortunately this is easy :-).

> (this raises the interesting question "what happens if you assign to
> __class__ from a __del__ method?")

subtype_dealloc actually attempts to take this possibility into
account -- see the comment "Extract the type again; tp_del may have
changed it". I'm not at all sure that it's handling is *correct* --
there's a bunch of code that references 'type' between the call to
tp_del and this comment, and there's code after the comment that
references 'base' without recalculating it. But it is there :-)

>> -- same_slots_added unconditionally casts the ancestral types
>> to (PyHeapTypeObject*). AFAICT that's why step (1) is there, to
>> protect this code. But I don't think the check actually works -- step
>> (1) checks that the types we're trying to assign are heap types, but
>> this is no guarantee that the *ancestral* types will be heap types.
>> [Also, the code for __bases__ assignment appears to also call into
>> this code with no heap type checks at all.] E.g., I think if you do
>>
>> class MyList(list):
>> __slots__ = ()
>>
>> class MyDict(dict):
>> __slots__ = ()
>>
>> MyList().__class__ = MyDict()
>>
>> then you'll end up in same_slots_added casting PyDict_Type and
>> PyList_Type to PyHeapTypeObjects and then following invalid pointers
>> into la-la land. (The __slots__ = () is to maintain layout
>> compatibility with the base types; if you find builtin types that
>> already have __dict__ and weaklist and HAVE_GC then this example
>> should still work even with perfectly empty subclasses.)
>>
>> Okay, so suppose we move the heap type check (step 1) down into
>> same_slots_added (step 4b), since AFAICT this is actually more correct
>> anyway. This is almost enough to enable __class__ assignment on
>> modules, because the cases we care about will go through the (4a)
>> branch rather than (4b), so the heap type thing is irrelevant.
>>
>> The remaining problem is the requirement that both types have the same
>> tp_dealloc (step 2). ModuleType itself has tp_dealloc ==
>> module_dealloc, while all(?) 

Re: [Python-Dev] Python 2.x and 3.x use survey, 2014 edition

2014-12-10 Thread Nathaniel Smith
On 10 Dec 2014 17:16, "Ian Cordasco"  wrote:
>
> On Wed, Dec 10, 2014 at 11:10 AM, Donald Stufft  wrote:
> >
> > On Dec 10, 2014, at 11:59 AM, Bruno Cauet  wrote:
> >
> > Hi all,
> > Last year a survey was conducted on python 2 and 3 usage.
> > Here is the 2014 edition, slightly updated (from 9 to 11 questions).
> > It should not take you more than 1 minute to fill. I would be pleased
if you
> > took that time.
> >
> > Here's the url: http://goo.gl/forms/tDTcm8UzB3
> > I'll publish the results around the end of the year.
> >
> > Last year results: https://wiki.python.org/moin/2.x-vs-3.x-survey
> >
> >
> > Just going to say http://d.stufft.io/image/0z1841112o0C is a hard
question
> > to answer, since most code I write is both.
> >
>
> The same holds for me.

That question appears to have just grown a "compatible with both" option.

It might make sense to add a similar option to the following question about
what you use for personal projects.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 (scandir): Poll to choose the implementation (full C or C+Python)

2015-02-13 Thread Nathaniel Smith
On 13 Feb 2015 02:09, "Victor Stinner"  wrote:
>
> A alternative is to add a new _scandir.c module to host the new C
> code, and share some code with posixmodule.c: remove "static" keyword
> from required C functions (functions to convert Windows attributes to
> a os.stat_result object).

Hopefully not too annoying question from an outsider: has cpython's build
system added the necessary bits to do this on a safe, portable,
non-symbol-namespace polluting way? E.g. using -fvisibility=hidden on Linux?

(I'm partially wondering because until very recently numpy was built by
concatenating all the different c files together and compiling that,
because that was the only portable way to let different files share access
to symbols without also exporting those symbols publicly from the resulting
module shared objects. And numpy supports a lot fewer platforms than
cpython proper...)

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] boxing and unboxing data types

2015-03-08 Thread Nathaniel Smith
On Mar 8, 2015 9:13 PM, "Steven D'Aprano"  wrote:
>
> There's no built-in way of calling __index__ that I know of (no
> equivalent to int(obj)),

There's operator.index(obj), at least.

> but slicing at the very least will call it,
> e.g. seq[a:] will call type(a).__index__.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Use ptyhon -s as default shbang for system python executables/daemons

2015-03-23 Thread Nathaniel Smith
On Mar 23, 2015 8:15 AM, "Antoine Pitrou"  wrote:
>
> On Mon, 23 Mar 2015 08:06:13 -0700
> Toshio Kuratomi  wrote:
> > >
> > > I really think Donald has a good point when he suggests a specific
> > > virtualenv for system programs using Python.
> > >
> > The isolation is what we're seeking but I think the amount of work
required
> > and the added complexity for the distributions will make that hard to
get
> > distributions to sign up for.
> >
> > If someone had the time to write a front end to install packages into
> > a single "system-wide isolation unit" whose backend was a virtualenv we
> > might be able to get distributions on-board with using that.
>
> I don't think we're asking distributions anything. We're suggesting a
> possible path, but it's not python-dev's job to dictate distributions
> how they should package Python.
>
> The virtualenv solution has the virtue that any improvement we might
> put in it to help system packagers would automatically benefit everyone.
> A specific "system Python" would not.
>
> > The front end would need to install software so that you can still
invoke
> > /usr/bin/system-application and "system-application" would take care of
> > activating the virtualenv.  It would need to be about as simple to build
> > as the present python2 setup.py build/install with the flexibility in
> > options that the distros need to install into FHS approved paths.  Some
> > things like man pages, locale files, config files, and possibly other
data
> > files might need to be installed outside of the virtualenv directory.
>
> Well, I don't understand what difference a virtualenv would make.
> Using a virtualenv amounts to invoking a different interpreter path.
> The rest of the filesystem (man pages locations, etc.) is still
> accessible in the same way. But I may miss something :-)

The main issue that jumps to my mind is that 'yum/apt-get install
some-python-package' should install it into both the base python
interpreter and the system virtualenv, but that 'sudo pip install
some-python-package' should install into only the base interpreter but not
the system virtualenv. (Even if those two commands are run in sequence with
different versions of some-python-package.) This seems potentially complex.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python-committers] Do we need to sign Windows files with GnuPG?

2015-04-03 Thread Nathaniel Smith
On Apr 3, 2015 5:50 PM, "Donald Stufft"  wrote:
>
>
> > On Apr 3, 2015, at 6:38 PM, M.-A. Lemburg  wrote:
> >
> > On 04.04.2015 00:14, Steve Dower wrote:
> >> The thing is, that's exactly the same goodness as Authenticode gives,
except everyone gets that for free and meanwhile you're the only one who
has admitted to using GPG on Windows :)
> >>
> >> Basically, what I want to hear is that GPG sigs provide significantly
better protection than hashes (and I can provide better than MD5 for all
files if it's useful), taking into consideration that (I assume) I'd have
to obtain a signing key for GPG and unless there's a CA involved like there
is for Authenticode, there's no existing trust in that key.
> >
> > Hashes only provide checks against file corruption (and then
> > only if you can trust the hash values). GPG provides all the
> > benefits of public key encryption on arbitrary files (not just
> > code).
> >
> > The main benefit in case of downloadable installers is to
> > be able to make sure that the files are authentic, meaning that
> > they were created and signed by the people listed as packagers.
> >
> > There is no CA infrastructure involved as for SSL certificates
> > or Authenticode, but it's easy to get the keys from key servers
> > given the key signatures available from python.org's download
> > pages.
>
> FTR if we’re relying on people to get the GPG keys from the download
> pages then there’s no additional benefit over just using a hash
> published on the same page.
>
> In order to get additional benefit we’d need to get Steve’s key
> signed by enough people to get him into the strong set.

I don't think that's true -- e.g. people who download the key for checking
3.5.0 will still have it when 3.5.1 is released, and notice if something
silently changes. In general distributing a key id widely on webpages /
mailing lists / using it consistently over multiple releases all increase
security, even if they fall short of perfect. Even the web of trust isn't
particularly trustworthy, it's just useful because it's harder to attack
two targets (the webserver and the WoT) than it is to attack one.

In any case, getting his key into the strong set ought to be trivial given
that pycon is next week.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python-committers] Do we need to sign Windows files with GnuPG?

2015-04-04 Thread Nathaniel Smith
On Sat, Apr 4, 2015 at 6:07 PM, Steve Dower  wrote:
> There's no problem, per se, but initially it was less trouble to use the
> trusted PSF certificate and native support than to add an extra step using a
> program I don't already use and trust, am restricted in use by my employer
> (because of the license and the fact there are alternatives), and developing
> the trust in a brand new certificate.
>
> Eventually the people saying "do it" will win through sheer persistence,
> since I'll get sick of trying to get a more detailed response and just
> concede. Not sure if that's how we want to be running the project though...

I don't get the impression that there's any particularly detailed
rationale that people aren't giving you; it's just that to the average
python-dev denizen, gpg-signing seems to provide some mild benefits
and with no downside. The certificate trust issue isn't a downside,
just a mild dilution of the upside. And I suspect python-dev generally
doesn't put much weight on the extra effort required (release managers
have all been using gpg for decades, it's pretty trivial), or see any
reason why Microsoft's internal GPL-hate should have any effect on the
PSF's behaviour. Though it's kinda inconvenient for you, obviously. (I
guess you could call Larry or someone, read them a hash over the
phone, and then have them create the actual gpg signatures.)

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 492 vs. PEP 3152, new round

2015-04-29 Thread Nathaniel Smith
On Apr 29, 2015 11:49 AM, "Yury Selivanov"  wrote:
>
> Hi Ethan,
>
>
> On 2015-04-29 2:32 PM, Ethan Furman wrote:
>>
>> On 04/29, Yury Selivanov wrote:
>>>
>>> On 2015-04-29 1:25 PM, Ethan Furman wrote:

 cannot also just work and be the same as the parenthesized
 version.
>>>
>>> Because it does not make any sense.
>>
>> I obviously don't understand your position that "it does not make
>> any sense" -- perhaps you could explain a bit?
>>
>> What I see is a suspension point that is waiting for the results of
>> coro(), which will be negated (and returned/assigned/whatever).
>> What part of that doesn't make sense?
>>
>
> Because you want operators to be resolved in the
> order you see them, generally.
>
> You want '(await -fut)' to:
>
> 1. Suspend on fut;
> 2. Get the result;
> 3. Negate it.
>
> This is a non-obvious thing. I would myself interpret it
> as:
>
> 1. Get fut.__neg__();
> 2. await on it.
>
> So I want to make this syntactically incorrect:

As a bystander, I don't really care either way about whether await -fut is
syntactically valid (since like you say it's semantically nonsense
regardless and no one will ever write it). But I would rather like to
actually know what the syntax actually is, not just have a list of examples
(which kinda gives me perl flashbacks). Is there any simple way to state
what the rules for parsing await are? Or do I just have to read the parser
code if I want to know that?

(I suspect this may also be the impetus behind Greg's request that it just
be treated the same as unary minus. IMHO it matters much more that the
rules be predictable and teachable than that they allow or disallow every
weird edge case in exactly the right way.)

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 492: What is the real goal?

2015-04-29 Thread Nathaniel Smith
On Wed, Apr 29, 2015 at 1:14 PM, Skip Montanaro
 wrote:
>
> On Wed, Apr 29, 2015 at 2:42 PM, Yury Selivanov 
> wrote:
>>
>> Anyways, I'd be OK to start using a new term, if "coroutine" is
>> confusing.
>
>
> According to Wikipedia, term "coroutine" was first coined in 1958, so
> several generations of computer science graduates will be familiar with the
> textbook definition. If your use of "coroutine" matches the textbook
> definition of the term, I think you should continue to use it instead of
> inventing new names which will just confuse people new to Python.

IIUC the problem is that Python has or will have a number of different
things that count as coroutines by that classic CS definition,
including generators, "async def" functions, and in general any object
that implements the same set of methods as one or both of these
objects, or possibly inherits from a certain abstract base class. It
would be useful to have some terms to refer specifically to async def
functions and the await protocol as opposed to generators and the
iterator protocol, and "coroutine" does not make this distinction.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 492 vs. PEP 3152, new round

2015-04-29 Thread Nathaniel Smith
On Wed, Apr 29, 2015 at 3:46 PM, Greg Ewing  wrote:
> Yury Selivanov wrote:
>>
>> I'm not sure
>> why Greg is pushing his Grammar idea so aggressively.
>
>
> Because I believe that any extra complexity in the grammar
> needs a very strong justification. It's complexity in the
> core language, like a new keyword, so it puts a burden on
> everyone's brain.
>
> Saying "I don't think anyone would ever need to write this,
> therefore we should disallow it" is not enough, given that
> there is a substantial cost to disallowing it.
>
> If you don't think there's a cost, consider that we *both*
> seem to be having trouble predicting the consequences of
> your proposed syntax, and you're the one who invented it.
> That's not a good sign!

FWIW, now that I've seen the precedence table in the updated PEP, it
seems really natural to me:
   https://www.python.org/dev/peps/pep-0492/#updated-operator-precedence-table
According to that, "await" is just a prefix operator that binds more
tightly than any arithmetic operation, but less tightly than
indexing/funcall/attribute lookup, which seems about right.

However, if what I just wrote were true, then that would mean that
"await -foo" and "await await foo" would be syntactically legal
(though useless). The fact that they apparently are *not* legal means
that in fact there is still some weird thing going on in the syntax
that I don't understand. And the PEP gives no further details, it just
suggests I go read the parser generator source.

My preference would be that the PEP be updated so that my one-sentence
summary above became correct. But like Guido, I don't necessarily care
about the exact details all that much. What I do feel strongly about
is that whatever syntax we end up with, there should be *some*
accurate human-readable description of *what it is*. AFAICT the PEP
currently doesn't have that.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 492 vs. PEP 3152, new round

2015-04-29 Thread Nathaniel Smith
On Wed, Apr 29, 2015 at 4:48 PM, Yury Selivanov  wrote:
> Nathaniel,
>
> On 2015-04-29 7:35 PM, Nathaniel Smith wrote:
>>
>> What I do feel strongly about
>> is that whatever syntax we end up with, there should be*some*
>> accurate human-readable description of*what it is*. AFAICT the PEP
>> currently doesn't have that.
>
> How to define human-readable description of how unary
> minus operator works?

Hah, good question :-). Of course we all learned how to parse
arithmetic in school, so perhaps it's a bit cheating to refer to that
knowledge. Except of course basically all our users *do* have that
knowledge (or else are forced to figure it out anyway). So I would be
happy with a description of "await" that just says "it's like unary
minus but higher precedence".

Even if we put aside our trained intuitions about arithmetic, I think
it's correct to say that the way unary minus is parsed is: everything
to the right of it that has a tighter precedence gets collected up and
parsed as an expression, and then it takes that expression as its
argument. Still pretty simple.

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 492 vs. PEP 3152, new round

2015-04-29 Thread Nathaniel Smith
On Wed, Apr 29, 2015 at 5:05 PM, Yury Selivanov  wrote:
> Nathaniel,
>
> On 2015-04-29 7:58 PM, Nathaniel Smith wrote:
>>
>> On Wed, Apr 29, 2015 at 4:48 PM, Yury Selivanov 
>> wrote:
>>>
>>> Nathaniel,
>>>
>>> On 2015-04-29 7:35 PM, Nathaniel Smith wrote:
>>>>
>>>> What I do feel strongly about
>>>> is that whatever syntax we end up with, there should be*some*
>>>> accurate human-readable description of*what it is*. AFAICT the PEP
>>>> currently doesn't have that.
>>>
>>> How to define human-readable description of how unary
>>> minus operator works?
>>
>> Hah, good question :-). Of course we all learned how to parse
>> arithmetic in school, so perhaps it's a bit cheating to refer to that
>> knowledge. Except of course basically all our users *do* have that
>> knowledge (or else are forced to figure it out anyway). So I would be
>> happy with a description of "await" that just says "it's like unary
>> minus but higher precedence".
>>
>> Even if we put aside our trained intuitions about arithmetic, I think
>> it's correct to say that the way unary minus is parsed is: everything
>> to the right of it that has a tighter precedence gets collected up and
>> parsed as an expression, and then it takes that expression as its
>> argument. Still pretty simple.
>
>
> Well, await is defined exactly like that ;)

So you're saying that "await -fut" and "await await fut" are actually
legal syntax after all, contra what the PEP says? Because "- -fut" is
totally legal syntax, so if await and unary minus work the same...
(Again I don't care about those examples in their own right, I just
find it frustrating that I can't answer these questions without asking
you each time.)

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 492 vs. PEP 3152, new round

2015-04-30 Thread Nathaniel Smith
On Apr 30, 2015 1:57 AM, "Greg Ewing"  wrote:
>
> Nathaniel Smith wrote:
>>
>> Even if we put aside our trained intuitions about arithmetic, I think
>> it's correct to say that the way unary minus is parsed is: everything
>> to the right of it that has a tighter precedence gets collected up and
>> parsed as an expression, and then it takes that expression as its
>> argument.
>
>
> Tighter or equal, actually: '--a' is allowed.
>
> This explains why Yury's syntax disallows 'await -f'.
> The 'await' operator requires something after it, but
> there's *nothing* between it and the following '-',
> which binds less tightly.
>
> So it's understandable, but you have to think a bit
> harder.
>
> Why do we have to think harder? I suspect it's because
> the notion of precedence is normally introduced to resolve
> ambiguities. Knowing that infix '*' has higher precedence
> than infix '+' tells us that 'a + b * c' is parsed as
> 'a + (b * c)' and not '(a + b) * c'.
>
> Similarly, knowing that infix '.' has higher precedence
> than prefix '-' tells us that '-a.b' is parsed as
> '-(a.b)' rather than '(-a).b'.
>
> However, giving prefix 'await' higher precedence than
> prefix '-' doesn't serve to resolve any ambiguity.
> '- await f' is parsed as '-(await f)' either way, and
> 'await f + g' is parsed as '(await f) + g' either way.
>
> So when we see 'await -f', we think we already know
> what it means. There is only one possible order for
> the operations, so it doesn't look as though precedence
> comes into it at all, and we don't consider it when
> judging whether it's a valid expression.

The other reason this threw me is that I've recently been spending time
with a shunting yard parser, and in shunting yard parsers unary prefix
operators just work in the expected way (their precedence only affects
their interaction with later binary operators; a chain of unaries is always
allowed). It's just a limitation of the parser generator tech that python
uses that it can't handle unary operators in the natural fashion. (OTOH it
can handle lots of cases that shunting yard parsers can't -- I'm not
criticizing python's choice of parser.) Once I read the new "documentation
grammar" this became much clearer.

> What's the conclusion from all this? I think it's
> that using precedence purely to disallow certain
> constructs, rather than to resolve ambiguities, leads
> to a grammar with less-than-intuitive characteristics.

The actual effect of making "await" a different precedence is to resolve
the ambiguity in
  await x ** 2

If await acted like -, then this would be
  await (x ** 2)
But with the proposed grammar, it's instead
  (await x) ** 2
Which is probably correct, and produces the IMHO rather nice invariant that
"await" binds more tightly than arithmetic in general (instead of having to
say that it binds more tightly than arithmetic *except* in this one corner
case...).

But then given the limitations of Python's parser plus the desire to
disambiguate the expression above in the given way, it becomes an arguably
regrettable, yet inevitable, consequence that
  await -fut
  await +fut
  await ~fut
become parse errors.

AFAICT these and the ** case are the only expressions where there's any
difference between Yury's proposed grammar and your proposal of treating
await like unary minus.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 492 vs. PEP 3152, new round

2015-04-30 Thread Nathaniel Smith
On Apr 30, 2015 8:40 PM, "Guido van Rossum"  wrote:
>
> On Thu, Apr 30, 2015 at 8:30 PM, Nathaniel Smith  wrote:
>>
>> The actual effect of making "await" a different precedence is to resolve
the ambiguity in
>>
>>   await x ** 2
>>
>> If await acted like -, then this would be
>>   await (x ** 2)
>> But with the proposed grammar, it's instead
>>   (await x) ** 2
>> Which is probably correct, and produces the IMHO rather nice invariant
that "await" binds more tightly than arithmetic in general (instead of
having to say that it binds more tightly than arithmetic *except* in this
one corner case...)
>
> Correct.
>>
>> AFAICT these and the ** case are the only expressions where there's any
difference between Yury's proposed grammar and your proposal of treating
await like unary minus. But then given the limitations of Python's parser
plus the desire to disambiguate the expression above in the given way, it
becomes an arguably regrettable, yet inevitable, consequence that
>>
>>   await -fut
>>   await +fut
>>   await ~fut
>> become parse errors.
>
>  Why is that regrettable? Do you have a plan for overloading one of those
on Futures? I personally consider it a feature that you can't do that. :-)

I didn't say it was regrettable, I said it was arguably regrettable. For
proof, see the last week of python-dev ;-).

(I guess all else being equal it would be nice if unary operators could
stack arbitrarily, since that really is the more natural parse rule IMO and
also if things had worked that way then I would have spent this thread less
confused. But this is a pure argument from elegance. In practice there's
obviously no good reason to write "await -fut" or "-not x", so meh,
whatever.)

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 492: async/await in Python; version 5

2015-05-05 Thread Nathaniel Smith
On May 5, 2015 12:40 PM, "Jim J. Jewett"  wrote:
>
>
> On Tue May 5 18:29:44 CEST 2015, Yury Selivanov posted an updated PEP492.
>
> Where are the following over-simplifications wrong?
>
[...snip...]
>
> [Note that the actual PEP uses iteration over the results of a new
> __await__ magic method, rather than .result on the object itself.
> I couldn't tell whether this was for explicit marking, or just for
> efficiency in avoiding future creation.]
>
> (4)  "await EXPR" is just syntactic sugar for EXPR.result
>
> except that, by being syntax, it better marks locations where
> unrelated tasks might have a chance to change shared data.
>
> [And that, as currently planned, the result of an await isn't
> actually the result; it is an iterator of results.]

This is where you're missing a key idea. (And I agree that more high-level
docs are very much needed!) Remember that this is just regular single
threaded python code, so just writing EXPR.result cannot possibly cause the
current task to pause and another one to start running, and then magically
switch back somehow when the result does become available. Imagine trying
to implement a .result attribute that does that -- it's impossible.

Writing 'x = await socket1.read(1)' is actually equivalent to writing a
little loop like:

while True:
# figure out what we need to happen to make progress
needed = "data from socket 1"
# suspend this function,
# and send the main loop a message telling it what we need
reply = (yield needed)
# okay, the main loop woke us up again
# let's see if they've sent us back what we asked for
if reply.type == "data from socket 1":
# got it!
x = reply.payload
break
else:
# if at first you don't succeed...
continue

(Now stare at the formal definition of 'yield from' until you see how it
maps onto the above... And if you're wondering why we need a loop, think
about the case where instead of calling socket.read we're calling http.get
or something that requires multiple steps to complete.)

So there actually is semantically no iterator here -- the thing that looks
like an iterator is actually the chatter back and forth between the
lower-level code and the main loop that is orchestrating everything. Then
when that's done, it returns the single result.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 492: async/await in Python; version 4

2015-05-05 Thread Nathaniel Smith
On May 5, 2015 2:14 PM, "Guido van Rossum"  wrote:
>
> In the PEP 492 world, these concepts map as follows:
>
> - Future translates to "something with an __await__ method" (and asyncio
Futures are trivially made compliant by defining Future.__await__ as an
alias for Future.__iter__);
>
> - "asyncio coroutine" maps to "PEP 492 coroutine object" (either defined
with `async def` or a generator decorated with @types.coroutine -- note
that @asyncio.coroutine incorporates the latter);
>
> - "either of the above" maps to "awaitable".

Err, aren't the first and third definitions above identical?

Surely we want to say: an async def function is a convenient shorthand for
creating a custom awaitable (exactly like how generators are a convenient
shorthand for creating custom iterators), and a Future is-an awaitable that
also adds some extra methods.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Python-versus-CPython question for __mul__ dispatch

2015-05-14 Thread Nathaniel Smith
Hi all,

While attempting to clean up some of the more squamous aspects of
numpy's operator dispatch code [1][2], I've encountered a situation
where the semantics we want and are using are possible according to
CPython-the-interpreter, but AFAICT ought not to be possible according
to Python-the-language, i.e., it's not clear to me whether it's
possible even in principle to implement an object that works the way
numpy.ndarray does in any other interpreter. Which makes me a bit
nervous, so I wanted to check if there was any ruling on this.

Specifically, the quirk we are relying on is this: in CPython, if you do

  [1, 2] * my_object

then my_object's __rmul__ gets called *before* list.__mul__,
*regardless* of the inheritance relationship between list and
type(my_object). This occurs as a side-effect of the weirdness
involved in having both tp_as_number->nb_multiply and
tp_as_sequence->sq_repeat in the C API -- when evaluating "a * b",
CPython tries a's nb_multiply, then b's nb_multiply, then a's
sq_repeat, then b's sq_repeat. Since list has an sq_repeat but not an
nb_multiply, this means that my_object's nb_multiply gets called
before any list method.

Here's an example demonstrating how weird this is. list.__mul__ wants
an integer, and by "integer" it means "any object with an __index__
method". So here's a class that list is happy to be multiplied by --
according to the ordinary rules for operator dispatch, in the example
below Indexable.__mul__ and __rmul__ shouldn't even get a look-in:

In [3]: class Indexable(object):
   ...: def __index__(self):
   ...: return 2
   ...:

In [4]: [1, 2] * Indexable()
Out[4]: [1, 2, 1, 2]

But, if I add an __rmul__ method, then this actually wins:

In [6]: class IndexableWithMul(object):
   ...: def __index__(self):
   ...: return 2
  ...: def __mul__(self, other):
   ...: return "indexable forward mul"
   ...: def __rmul__(self, other):
   ...: return "indexable reverse mul"

In [7]: [1, 2] * IndexableWithMul()
Out[7]: 'indexable reverse mul'

In [8]: IndexableWithMul() * [1, 2]
Out[8]: 'indexable forward mul'

NumPy arrays, of course, correctly define both __index__ method (which
raises an array on general arrays but coerces to int for arrays that
contain exactly 1 integer), and also defines an nb_multiply slot which
accepts lists and performs elementwise multiplication:

In [9]: [1, 2] * np.array(2)
Out[9]: array([2, 4])

And that's all great! Just what we want. But the only reason this is
possible, AFAICT, is that CPython 'list' is a weird type with
undocumented behaviour that you can't actually define using pure
Python code.

Should I be worried?

-n

[1] https://github.com/numpy/numpy/pull/5864
[2] https://github.com/numpy/numpy/issues/5844

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-versus-CPython question for __mul__ dispatch

2015-05-14 Thread Nathaniel Smith
On Thu, May 14, 2015 at 9:29 PM, Guido van Rossum  wrote:
> I expect you can make something that behaves like list by defining __mul__
> and __rmul__ and returning NotImplemented.

Hmm, it's fairly tricky, and part of the trick is that you can never
return NotImplemented (because you have to pretty much take over and
entirely replace the normal dispatch rules inside __mul__ and
__rmul__), but see attached for something I think should work.

So I guess this is just how Python's list, tuple, etc. work, and PyPy
and friends need to match...

-n

> On Thursday, May 14, 2015, Stefan Richthofer 
> wrote:
>>
>> >>Should I be worried?
>>
>> You mean should *I* be worried ;)
>>
>> Stuff like this is highly relevant for JyNI, so thanks very much for
>> clarifying this
>> subtle behavior. It went onto my todo-list right now to ensure that JyNI
>> will emulate
>> this behavior as soon as I am done with gc-support. (Hopefully it will be
>> feasible,
>> but I can only tell in half a year or so since there are currently other
>> priorities.)
>> Still, this "essay" potentially will save me a lot of time.
>>
>> So, everybody please feel encouraged to post things like this as they come
>> up. Maybe
>> there could be kind of a pitfalls-page somewhere in the docs collecting
>> these things.
>>
>> Best
>>
>> Stefan
>>
>>
>> > Gesendet: Freitag, 15. Mai 2015 um 02:45 Uhr
>> > Von: "Nathaniel Smith" 
>> > An: "Python Dev" 
>> > Betreff: [Python-Dev] Python-versus-CPython question for __mul__
>> > dispatch
>> >
>> > Hi all,
>> >
>> > While attempting to clean up some of the more squamous aspects of
>> > numpy's operator dispatch code [1][2], I've encountered a situation
>> > where the semantics we want and are using are possible according to
>> > CPython-the-interpreter, but AFAICT ought not to be possible according
>> > to Python-the-language, i.e., it's not clear to me whether it's
>> > possible even in principle to implement an object that works the way
>> > numpy.ndarray does in any other interpreter. Which makes me a bit
>> > nervous, so I wanted to check if there was any ruling on this.
>> >
>> > Specifically, the quirk we are relying on is this: in CPython, if you do
>> >
>> >   [1, 2] * my_object
>> >
>> > then my_object's __rmul__ gets called *before* list.__mul__,
>> > *regardless* of the inheritance relationship between list and
>> > type(my_object). This occurs as a side-effect of the weirdness
>> > involved in having both tp_as_number->nb_multiply and
>> > tp_as_sequence->sq_repeat in the C API -- when evaluating "a * b",
>> > CPython tries a's nb_multiply, then b's nb_multiply, then a's
>> > sq_repeat, then b's sq_repeat. Since list has an sq_repeat but not an
>> > nb_multiply, this means that my_object's nb_multiply gets called
>> > before any list method.
>> >
>> > Here's an example demonstrating how weird this is. list.__mul__ wants
>> > an integer, and by "integer" it means "any object with an __index__
>> > method". So here's a class that list is happy to be multiplied by --
>> > according to the ordinary rules for operator dispatch, in the example
>> > below Indexable.__mul__ and __rmul__ shouldn't even get a look-in:
>> >
>> > In [3]: class Indexable(object):
>> >...: def __index__(self):
>> >...: return 2
>> >...:
>> >
>> > In [4]: [1, 2] * Indexable()
>> > Out[4]: [1, 2, 1, 2]
>> >
>> > But, if I add an __rmul__ method, then this actually wins:
>> >
>> > In [6]: class IndexableWithMul(object):
>> >...: def __index__(self):
>> >...: return 2
>> >   ...: def __mul__(self, other):
>> >...: return "indexable forward mul"
>> >...: def __rmul__(self, other):
>> >...: return "indexable reverse mul"
>> >
>> > In [7]: [1, 2] * IndexableWithMul()
>> > Out[7]: 'indexable reverse mul'
>> >
>> > In [8]: IndexableWithMul() * [1, 2]
>> > Out[8]: 'indexable forward mul'
>> >
>> > NumPy arrays, of course, correctly define both __index__ method (which
>> > raises an array on general arrays but coerces to int for arrays 

Re: [Python-Dev] Python-versus-CPython question for __mul__ dispatch

2015-05-15 Thread Nathaniel Smith
On Thu, May 14, 2015 at 11:53 PM, Nathaniel Smith  wrote:
> On Thu, May 14, 2015 at 9:29 PM, Guido van Rossum  wrote:
>> I expect you can make something that behaves like list by defining __mul__
>> and __rmul__ and returning NotImplemented.
>
> Hmm, it's fairly tricky, and part of the trick is that you can never
> return NotImplemented (because you have to pretty much take over and
> entirely replace the normal dispatch rules inside __mul__ and
> __rmul__), but see attached for something I think should work.
>
> So I guess this is just how Python's list, tuple, etc. work, and PyPy
> and friends need to match...

For the record, it looks like PyPy does already have a hack to
implement this -- they do it by having a hidden flag on the built-in
sequence types which the implementations of '*' and '+' check for, and
if it's found it triggers a different rule for dispatching to the
__op__ methods:

https://bitbucket.org/pypy/pypy/src/a1a494787f4112e42f50c6583e0fea18db3fb4fa/pypy/objspace/descroperation.py?at=default#cl-692

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-versus-CPython question for __mul__ dispatch

2015-05-17 Thread Nathaniel Smith
On Sat, May 16, 2015 at 1:31 AM, Nick Coghlan  wrote:
> On 16 May 2015 at 07:35, Nathaniel Smith  wrote:
>> On Thu, May 14, 2015 at 11:53 PM, Nathaniel Smith  wrote:
>>> On Thu, May 14, 2015 at 9:29 PM, Guido van Rossum  wrote:
>>>> I expect you can make something that behaves like list by defining __mul__
>>>> and __rmul__ and returning NotImplemented.
>>>
>>> Hmm, it's fairly tricky, and part of the trick is that you can never
>>> return NotImplemented (because you have to pretty much take over and
>>> entirely replace the normal dispatch rules inside __mul__ and
>>> __rmul__), but see attached for something I think should work.
>>>
>>> So I guess this is just how Python's list, tuple, etc. work, and PyPy
>>> and friends need to match...
>>
>> For the record, it looks like PyPy does already have a hack to
>> implement this -- they do it by having a hidden flag on the built-in
>> sequence types which the implementations of '*' and '+' check for, and
>> if it's found it triggers a different rule for dispatching to the
>> __op__ methods:
>> 
>> https://bitbucket.org/pypy/pypy/src/a1a494787f4112e42f50c6583e0fea18db3fb4fa/pypy/objspace/descroperation.py?at=default#cl-692
>
> Oh, that's rather annoying that the PyPy team implemented bug-for-bug
> compatibility there, and didn't follow up on the operand precedence
> bug report to say that they had done so. We also hadn't previously
> been made aware that NumPy is relying on this operand precedence bug
> to implement publicly documented API behaviour, so fixing it *would*
> break end user code :(

I don't think any of us were aware of it either :-).

It is a fairly obscure case -- it only comes up specifically if you
have a single-element integer array that you are trying to multiply by
a list that you expect to be auto-coerced to an array. If Python
semantics were such that this became impossible to handle correctly
then we would survive. (We've certainly survived worse, e.g.
arr[array_of_indices] += 1 silently gives the wrong/unexpected result
when array_of_indices has duplicate entries, and this bites people
constantly. Unfortunately I can't see any reasonable way to fix this
within Python's semantics, so... oh well.)

But yeah, given that we're at a point where list dispatch actually has
worked this way forever and across multiple interpreter
implementations, I think it's de facto going to end up part of the
language specification unless someone does something pretty quick...

> I guess that means someone in the numeric community will need to write
> a PEP to make this "try the other operand first" "feature" part of the
> language specification, so that other interpreters can implement it up
> front, rather than all having to come up with their own independent
> custom hacks just to make NumPy work.

I'll make a note...

> P.S. It would also be nice if someone could take on the PEP for a
> Python level buffer API for 3.6: http://bugs.python.org/issue13797

At a guess, if you want to find people who have this itch strong
enough to try scratching it, then probably numpy users are actually
not your best bet, b/c if you have numpy then you already have
workarounds. In particular, numpy still supports a legacy Python level
buffer export API:

   http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#python-side

So if all you want is to hand a buffer to numpy (rather than to an
arbitrary PEP 3118 consumer) then this works fine, and if you do need
an arbitrary PEP 3118 consumer then you can use numpy as an adaptor
(use __array_interface__ to convert your object to ndarray -> ndarray
supports the PEP 3118 API).

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 556: Threaded garbage collection

2017-09-08 Thread Nathaniel Smith
On Fri, Sep 8, 2017 at 12:13 PM, Antoine Pitrou  wrote:
> On Fri, 08 Sep 2017 12:04:10 -0700
> Benjamin Peterson  wrote:
>> I like it overall.
>>
>> - I was wondering what happens during interpreter shutdown. I see you
>> have that listed as a open issue. How about simply shutting down the
>> finalization thread and not guaranteeing that finalizers are actually
>> ever run à la Java?
>
> I don't know.  People generally have expectations towards stuff being
> finalized properly (especially when talking about files etc.).
> Once the first implementation is devised, we will know more about
> what's workable (perhaps we'll have to move _PyGC_Fini earlier in the
> shutdown sequence?  perhaps we'll want to switch back to serial mode
> when shutting down?).

PyPy just abandons everything when shutting down, instead of running
finalizers. See the last paragraph of :
http://doc.pypy.org/en/latest/cpython_differences.html#differences-related-to-garbage-collection-strategies

So that might be a useful source of experience.

On another note, I'm going to be that annoying person who suggests
massively extending the scope of your proposal. Feel free to throw
things at me or whatever.

Would it make sense to also move signal handlers to run in this
thread? Those are the other major source of nasty re-entrancy
problems.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 549 v2: now titled Instance Descriptors

2017-09-08 Thread Nathaniel Smith
On Fri, Sep 8, 2017 at 1:45 PM, Ethan Furman  wrote:
> On 09/08/2017 12:44 PM, Larry Hastings wrote:
>
>> I've updated PEP 549 with a new title--"Instance Descriptors" is a better
>> name than "Instance Properties"--and to
>> clarify my rationale for the PEP.  I've also updated the prototype with
>> code cleanups and a new type:
>> "collections.abc.InstanceDescriptor", a base class that allows user
>> classes to be instance descriptors.
>
>
> I like the new title, I'm +0 on the PEP itself, and I have one correction
> for the PEP:  we've had the ability to simulate module properties for ages:
>
> Python 2.7.6 (default, Oct 26 2016, 20:32:47)
> [GCC 4.8.4] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> --> import module_class
> --> module_class.hello
> 'hello'
> --> module_class.hello = 'hola'
> --> module_class.hello
> 'hola'
>
> And the code:
>
> class ModuleClass(object):
> @property
> def hello(self):
> try:
> return self._greeting
> except AttributeError:
> return 'hello'
> @hello.setter
> def hello(self, value):
> self._greeting = value
>
> import sys
> sys.modules[__name__] = ModuleClass()
>
> I will admit I don't see what reassigning the __class__ attribute on a
> module did for us.

If you have an existing package that doesn't replace itself in
sys.modules, then it's difficult and risky to switch to that form --
don't think of toy examples, think of django/__init__.py or
numpy/__init__.py. You have to rewrite the whole export logic, and you
have to figure out what to do with things like submodules that import
from the parent module before the swaparoo happens, you can get skew
issues between the original module namespace and the replacement class
namespace, etc. The advantage of the __class__ assignment trick (as
compared to what we had before) is that it lets you easily and safely
retrofit this kind of magic onto existing packages.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v2 (new "interpreters" module)

2017-09-08 Thread Nathaniel Smith
On Fri, Sep 8, 2017 at 8:54 PM, Michel Desmoulin
 wrote:
>
> Le 09/09/2017 à 01:28, Stefan Krah a écrit :
>> Still, the argument "who uses subinterpreters?" of course still remains.
>
> For now, nobody. But if we expose it and web frameworks manage to create
> workers as fast as multiprocessing and as cheap as threading, you will
> find a lot of people starting to want to use it.

To temper expectations a bit here, it sounds like the first version
might be more like: as slow as threading (no multicore), as expensive
as multiprocessing (no shared memory), and -- on Unix -- slower to
start than either of them (no fork).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v2 (new "interpreters" module)

2017-09-09 Thread Nathaniel Smith
On Sep 9, 2017 9:07 AM, "Nick Coghlan"  wrote:


To immediately realise some level of efficiency benefits from the
shared memory space between the main interpreter and subinterpreters,
I also think these low level FIFOs should be defined as accepting any
object that supports the PEP 3118 buffer protocol, and emitting
memoryview() objects on the receiving end, rather than being bytes-in,
bytes-out.


Is your idea that this memoryview would refer directly to the sending
interpreter's memory (as opposed to a copy into some receiver-owned
buffer)? If so, then how do the two subinterpreters coordinate the buffer
release when the memoryview is closed?

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v2 (new "interpreters" module)

2017-09-09 Thread Nathaniel Smith
On Sep 8, 2017 4:06 PM, "Eric Snow"  wrote:


   run(code):

  Run the provided Python code in the interpreter, in the current
  OS thread.  If the interpreter is already running then raise
  RuntimeError in the interpreter that called ``run()``.

  The current interpreter (which called ``run()``) will block until
  the subinterpreter finishes running the requested code.  Any
  uncaught exception in that code will bubble up to the current
  interpreter.


This phrase "bubble up" here is doing a lot of work :-). Can you elaborate
on what you mean? The text now makes it seem like the exception will just
pass from one interpreter into another, but that seems impossible – it'd
mean sharing not just arbitrary user defined exception classes but full
frame objects...

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] breakpoint() and $PYTHONBREAKPOINT

2017-09-10 Thread Nathaniel Smith
On Sun, Sep 10, 2017 at 12:06 PM, Barry Warsaw  wrote:
> For PEP 553, I think it’s a good idea to support the environment variable 
> $PYTHONBREAKPOINT[*] but I’m stuck on a design question, so I’d like to get 
> some feedback.
>
> Should $PYTHONBREAKPOINT be consulted in breakpoint() or in 
> sys.breakpointhook()?

Wouldn't the usual pattern be to check $PYTHONBREAKPOINT once at
startup, and if it's set use it to initialize sys.breakpointhook()?
Compare to, say, $PYTHONPATH.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 557: Data Classes

2017-09-10 Thread Nathaniel Smith
Hi Eric,

A few quick comments:

Why do you even have a hash= argument on individual fields? For the whole
class, I can imagine you might want to explicitly mark a whole class as
unhashable, but it seems like the only thing you can do with the
field-level hash= argument is to create a class where the __hash__ and
__eq__ take different fields into account, and why would you ever want that?

Though honestly I can see a reasonable argument for removing the
class-level hash= option too. And even if you keep it you might want to
error on some truly nonsensical options like defining __hash__ without
__eq__. (Also watch out that Python's usual rule about defining __eq__
blocking the inheritance of __hash__ does not kick in if __eq__ is added
after the class is created.)

I've sometimes wished that attrs let me control whether it generated
equality methods (eq/ne/hash) separately from ordering methods (lt/gt/...).
Maybe the cmp= argument should take an enum with options
none/equality-only/full?

The "why not attrs" section kind of reads like "because it's too popular
and useful"?

-n

On Sep 8, 2017 08:44, "Eric V. Smith"  wrote:

Oops, I forgot the link. It should show up shortly at
https://www.python.org/dev/peps/pep-0557/.

Eric.


On 9/8/17 7:57 AM, Eric V. Smith wrote:

> I've written a PEP for what might be thought of as "mutable namedtuples
> with defaults, but not inheriting tuple's behavior" (a mouthful, but it
> sounded simpler when I first thought of it). It's heavily influenced by
> the attrs project. It uses PEP 526 type annotations to define fields.
> From the overview section:
>
> @dataclass
> class InventoryItem:
> name: str
> unit_price: float
> quantity_on_hand: int = 0
>
> def total_cost(self) -> float:
> return self.unit_price * self.quantity_on_hand
>
> Will automatically add these methods:
>
>   def __init__(self, name: str, unit_price: float, quantity_on_hand: int
> = 0) -> None:
>   self.name = name
>   self.unit_price = unit_price
>   self.quantity_on_hand = quantity_on_hand
>   def __repr__(self):
>   return
> f'InventoryItem(name={self.name!r},unit_price={self.unit_pri
> ce!r},quantity_on_hand={self.quantity_on_hand!r})'
>
>   def __eq__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) ==
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __ne__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) !=
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __lt__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) <
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __le__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) <=
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __gt__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) >
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>   def __ge__(self, other):
>   if other.__class__ is self.__class__:
>   return (self.name, self.unit_price, self.quantity_on_hand) >=
> (other.name, other.unit_price, other.quantity_on_hand)
>   return NotImplemented
>
> Data Classes saves you from writing and maintaining these functions.
>
> The PEP is largely complete, but could use some filling out in places.
> Comments welcome!
>
> Eric.
>
> P.S. I wrote this PEP when I was in my happy place.
>
> ___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%
40pobox.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 557: Data Classes

2017-09-11 Thread Nathaniel Smith
On Mon, Sep 11, 2017 at 5:32 AM, Eric V. Smith  wrote:
> On 9/10/17 11:08 PM, Nathaniel Smith wrote:
>>
>> Hi Eric,
>>
>> A few quick comments:
>>
>> Why do you even have a hash= argument on individual fields? For the
>> whole class, I can imagine you might want to explicitly mark a whole
>> class as unhashable, but it seems like the only thing you can do with
>> the field-level hash= argument is to create a class where the __hash__
>> and __eq__ take different fields into account, and why would you ever
>> want that?
>
>
> The use case is that you have a cache, or something similar, that doesn't
> affect the object identity.

But wouldn't this just be field(cmp=False), no need to fiddle with hash=?

>> Though honestly I can see a reasonable argument for removing the
>> class-level hash= option too. And even if you keep it you might want to
>> error on some truly nonsensical options like defining __hash__ without
>> __eq__. (Also watch out that Python's usual rule about defining __eq__
>> blocking the inheritance of __hash__ does not kick in if __eq__ is added
>> after the class is created.)
>>
>> I've sometimes wished that attrs let me control whether it generated
>> equality methods (eq/ne/hash) separately from ordering methods
>> (lt/gt/...). Maybe the cmp= argument should take an enum with options
>> none/equality-only/full?
>
>
> Yeah, I've thought about this, too. But I don't have any use case in mind,
> and if it hasn't come up with attrs, then I'm reluctant to break new ground
> here.

https://github.com/python-attrs/attrs/issues/170

>From that thread, it feels like part of the problem is that it's
awkward to encode this using only boolean arguments, but they're sort
of stuck with that for backcompat with how they originally defined
cmp=, hence my suggestion to consider making it an enum from the start
:-).

>> The "why not attrs" section kind of reads like "because it's too popular
>> and useful"?
>
>
> I'll add some words to that section, probably focused on typing
> compatibility. My general feeling is that attrs has some great design
> decisions, but goes a little too far (e.g., conversions, validations). As
> with most things we add, I'm trying to be as minimalist as possible, while
> still being widely useful and allowing 3rd party extensions and future
> features.

If the question is "given that we're going to add something to the
stdlib, why shouldn't that thing be attrs?" then I guess it's
sufficient to say "because the attrs developers didn't want it". But I
think the PEP should also address the question "why are we adding
something to the stdlib, instead of just recommending people install
attrs".

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] breakpoint() and $PYTHONBREAKPOINT

2017-09-11 Thread Nathaniel Smith
On Mon, Sep 11, 2017 at 5:27 PM, Barry Warsaw  wrote:
> On Sep 10, 2017, at 13:46, Nathaniel Smith  wrote:
>>
>> On Sun, Sep 10, 2017 at 12:06 PM, Barry Warsaw  wrote:
>>> For PEP 553, I think it’s a good idea to support the environment variable 
>>> $PYTHONBREAKPOINT[*] but I’m stuck on a design question, so I’d like to get 
>>> some feedback.
>>>
>>> Should $PYTHONBREAKPOINT be consulted in breakpoint() or in 
>>> sys.breakpointhook()?
>>
>> Wouldn't the usual pattern be to check $PYTHONBREAKPOINT once at
>> startup, and if it's set use it to initialize sys.breakpointhook()?
>> Compare to, say, $PYTHONPATH.
>
> Perhaps, but what would be the visible effects of that?  I.e. what would that 
> buy you?

Why is this case special enough to break the rules?

Compared to checking it on each call to sys.breakpointhook(), I guess
the two user-visible differences in behavior would be:

- whether mutating os.environ["PYTHONBREAKPOINT"] inside the process
affects future calls. I would find it quite surprising if it did;
generally when we mutate envvars like os.environ["PYTHONPATH"] it's a
way to set things up for future child processes and doesn't affect our
process.

- whether the import happens immediately at startup, or is delayed
until the first call to breakpoint(). If it's imported once at
startup, then this adds overhead to programs that set PYTHONBREAKPOINT
but don't use it, and if the envvar is set to some nonsense then you
get an error immediately instead of at the first call to breakpoint().
These both seem like fairly minor differences to me, but maybe saving
that 30 ms or whatever of startup time is an important enough
optimization to justify the special case?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] breakpoint() and $PYTHONBREAKPOINT

2017-09-11 Thread Nathaniel Smith
On Mon, Sep 11, 2017 at 6:45 PM, Barry Warsaw  wrote:
> On Sep 11, 2017, at 18:15, Nathaniel Smith  wrote:
>
>> Compared to checking it on each call to sys.breakpointhook(), I guess
>> the two user-visible differences in behavior would be:
>>
>> - whether mutating os.environ["PYTHONBREAKPOINT"] inside the process
>> affects future calls. I would find it quite surprising if it did;
>
> Maybe, but the effect would be essentially the same as setting 
> sys.breakpointhook during the execution of your program.
>
>> - whether the import happens immediately at startup, or is delayed
>> until the first call to breakpoint().
>
> I definitely think it should be delayed until first use.  You might never hit 
> the breakpoint() in which case you wouldn’t want to pay any penalty for even 
> checking the environment variable.  And once you *do* hit the breakpoint(), 
> you aren’t caring about performance then anyway.
>
> Both points could be addressed by caching the import after the first lookup, 
> but even there I don’t think it’s worth it.  I’d invoke KISS and just look it 
> up anew each time.  That way there’s no global/static state to worry about, 
> and the feature is as flexible as it can be.
>
>> If it's imported once at
>> startup, then this adds overhead to programs that set PYTHONBREAKPOINT
>> but don't use it, and if the envvar is set to some nonsense then you
>> get an error immediately instead of at the first call to breakpoint().
>
> That’s one case I can see being useful; report an error immediately upon 
> startup rather that when you hit breakpoint().  But even there, I’m not sure. 
>  What if you’ve got some kind of dynamic debugger discovery thing going on, 
> such that the callable isn’t available until its first use?

I don't think that's a big deal? Just do the discovery logic from
inside the callable itself, instead of using some kind of magical
attribute lookup hook.

>> These both seem like fairly minor differences to me, but maybe saving
>> that 30 ms or whatever of startup time is an important enough
>> optimization to justify the special case?
>
> I’m probably too steeped in the implementation, but it feels to me like not 
> just loading the envar callable on demand makes reasoning about and using 
> this more complicated.  I think for most uses though, it won’t really matter.

I don't mind breaking from convention if you have a good reason, I
just like it for PEPs to actually write down that reasoning :-)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 549: Instance Properties (aka: module properties)

2017-09-13 Thread Nathaniel Smith
On Wed, Sep 13, 2017 at 11:49 AM, Guido van Rossum  wrote:
>  > Why not adding both? Properties do have their uses as does __getattr__.
>
> In that case I would just add __getattr__ to module.c, and add a recipe or
> perhaps a utility module that implements a __getattr__ you can put into your
> module if you want @property support. That way you can have both but you
> only need a little bit of code in module.c to check for __getattr__ and call
> it when you'd otherwise raise AttributeError.

Unfortunately I don't think this works. If there's a @property object
present in the module's instance dict, then __getattribute__ will
return it directly instead of calling __getattr__.

(I guess for full property emulation you'd also need to override
__setattr__ and __dir__, but I don't know how important that is.)

We could consider letting modules overload __getattribute__ instead of
__getattr__, but I don't think this is viable either -- a key feature
of __getattr__ is that it doesn't add overhead to normal lookups. If
you implement deprecation warnings by overloading __getattribute__,
then it makes all your users slower, even the ones who never touch the
deprecated attributes. __getattr__ is much better than
__getattribute__ for this purpose.

Alternatively we can have a recipe that implements @property support
using __class__ assignment and overriding
__getattribute__/__setattr__/__dir__, so instead of 'from
module_helper.property_emulation import __getattr__' it'd be 'from
module_helper import enable_property_emulation;
enable_property_emulation(__name__)'. Still has the slowdown problem
but it would work.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-13 Thread Nathaniel Smith
On Sep 13, 2017 9:01 PM, "Nick Coghlan"  wrote:

On 14 September 2017 at 11:44, Eric Snow 
wrote:
>send(obj):
>
>Send the object to the receiving end of the channel.  Wait until
>the object is received.  If the channel does not support the
>object then TypeError is raised.  Currently only bytes are
>supported.  If the channel has been closed then EOFError is
>raised.

I still expect any form of object sharing to hinder your
per-interpreter GIL efforts, so restricting the initial implementation
to memoryview-only seems more future-proof to me.


I don't get it. With bytes, you can either share objects or copy them and
the user can't tell the difference, so you can change your mind later if
you want. But memoryviews require some kind of cross-interpreter strong
reference to keep the underlying buffer object alive. So if you want to
minimize object sharing, surely bytes are more future-proof.


> Handling an exception
> -
>
> ::
>
>interp = interpreters.create()
>try:
>interp.run("""if True:
>raise KeyError
>""")
>except KeyError:
>print("got the error from the subinterpreter")

As with the message passing through channels, I think you'll really
want to minimise any kind of implicit object sharing that may
interfere with future efforts to make the GIL truly an *interpreter*
lock, rather than the global process lock that it is currently.

One possible way to approach that would be to make the low level run()
API a more Go-style API rather than a Python-style one, and have it
return a (result, err) 2-tuple. "err.raise()" would then translate the
foreign interpreter's exception into a local interpreter exception,
but the *traceback* for that exception would be entirely within the
current interpreter.


It would also be reasonable to simply not return any value/exception from
run() at all, or maybe just a bool for whether there was an unhandled
exception. Any high level API is going to be injecting code on both sides
of the interpreter boundary anyway, so it can do whatever exception and
traceback translation it wants to.


> Reseting __main__
> -
>
> As proposed, every call to ``Interpreter.run()`` will execute in the
> namespace of the interpreter's existing ``__main__`` module.  This means
> that data persists there between ``run()`` calls.  Sometimes this isn't
> desireable and you want to execute in a fresh ``__main__``.  Also,
> you don't necessarily want to leak objects there that you aren't using
> any more.
>
> Solutions include:
>
> * a ``create()`` arg to indicate resetting ``__main__`` after each
>   ``run`` call
> * an ``Interpreter.reset_main`` flag to support opting in or out
>   after the fact
> * an ``Interpreter.reset_main()`` method to opt in when desired
>
> This isn't a critical feature initially.  It can wait until later
> if desirable.

I was going to note that you can already do this:

interp.run("globals().clear()")

However, that turns out to clear *too* much, since it also clobbers
all the __dunder__ attributes that the interpreter needs in a code
execution environment.

Either way, if you added this, I think it would make more sense as an
"importlib.util.reset_globals()" operation, rather than have it be
something specific to subinterpreters.


This is another point where the API could reasonably say that if you want
clean namespaces then you should do that yourself (e.g. by setting up your
own globals dict and using it to execute any post-bootstrap code).

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-14 Thread Nathaniel Smith
On Thu, Sep 14, 2017 at 5:44 PM, Nick Coghlan  wrote:
> On 14 September 2017 at 15:27, Nathaniel Smith  wrote:
>> I don't get it. With bytes, you can either share objects or copy them and
>> the user can't tell the difference, so you can change your mind later if you
>> want.
>> But memoryviews require some kind of cross-interpreter strong
>> reference to keep the underlying buffer object alive. So if you want to
>> minimize object sharing, surely bytes are more future-proof.
>
> Not really, because the only way to ensure object separation (i.e no
> refcounted objects accessible from multiple interpreters at once) with
> a bytes-based API would be to either:
>
> 1. Always copy (eliminating most of the low overhead communications
> benefits that subinterpreters may offer over multiple processes)
> 2. Make the bytes implementation more complicated by allowing multiple
> bytes objects to share the same underlying storage while presenting as
> distinct objects in different interpreters
> 3. Make the output on the receiving side not actually a bytes object,
> but instead a view onto memory owned by another object in a different
> interpreter (a "memory view", one might say)
>
> And yes, using memory views for this does mean defining either a
> subclass or a mediating object that not only keeps the originating
> object alive until the receiving memoryview is closed, but also
> retains a reference to the originating interpreter so that it can
> switch to it when it needs to manipulate the source object's refcount
> or call one of the buffer methods.
>
> Yury and I are fine with that, since it means that either the sender
> *or* the receiver can decide to copy the data (e.g. by calling
> bytes(obj) before sending, or bytes(view) after receiving), and in the
> meantime, the object holding the cross-interpreter view knows that it
> needs to switch interpreters (and hence acquire the sending
> interpreter's GIL) before doing anything with the source object.
>
> The reason we're OK with this is that it means that only reading a new
> message from a channel (i.e creating a cross-interpreter view) or
> discarding a previously read message (i.e. closing a cross-interpreter
> view) will be synchronisation points where the receiving interpreter
> necessarily needs to acquire the sending interpreter's GIL.
>
> By contrast, if we allow an actual bytes object to be shared, then
> either every INCREF or DECREF on that bytes object becomes a
> synchronisation point, or else we end up needing some kind of
> secondary per-interpreter refcount where the interpreter doesn't drop
> its shared reference to the original object in its source interpreter
> until the internal refcount in the borrowing interpreter drops to
> zero.

Ah, that makes more sense.

I am nervous that allowing arbitrary memoryviews gives a *little* more
power than we need or want. I like that the current API can reasonably
be emulated using subprocesses -- it opens up the door for backports,
compatibility support on language implementations that don't support
subinterpreters, direct benchmark comparisons between the two
implementation strategies, etc. But if we allow arbitrary memoryviews,
then this requires that you can take (a) an arbitrary object, not
specified ahead of time, and (b) provide two read-write views on it in
separate interpreters such that modifications made in one are
immediately visible in the other. Subprocesses can do one or the other
-- they can copy arbitrary data, and if you warn them ahead of time
when you allocate the buffer, they can do real zero-copy shared
memory. But the combination is really difficult.

It'd be one thing if this were like a key feature that gave
subinterpreters an advantage over subprocesses, but it seems really
unlikely to me that a library won't know ahead of time when it's
filling in a buffer to be transferred, and if anything it seems like
we'd rather not expose read-write shared mappings in any case. It's
extremely non-trivial to do right [1].

tl;dr: let's not rule out a useful implementation strategy based on a
feature we don't actually need.

One alternative would be your option (3) -- you can put bytes in and
get memoryviews out, and since bytes objects are immutable it's OK.

[1] https://en.wikipedia.org/wiki/Memory_model_(programming)

>>> Handling an exception
>>> -
>> It would also be reasonable to simply not return any value/exception from
>> run() at all, or maybe just a bool for whether there was an unhandled
>> exception. Any high level API is going to be injecting code on both sides of
>> the interpreter boundary anyway, so it can do whatever exception and
>> traceback translation it 

Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Nathaniel Smith
On Sep 18, 2017 07:58, "Antoine Pitrou"  wrote:


Le 18/09/2017 à 16:52, Guido van Rossum a écrit :
>
> In Python 2 the traceback was not part of the exception object because
> there was (originally) no cycle GC. In Python GC we changed the awkward
> interface to something more useful, because we could depend on GC. Why
> are we now trying to roll back this feature? We should just improve GC.
> (Or perhaps you shouldn't be raising so many exceptions. :-)

Improving the GC is obviously a good thing, but what heuristic would you
have in mind that may solve the issue at hand?


I read the whole thread and I'm not sure what the issue at hand is :-).
Obviously it's nice when the refcount system is able to implicitly clean
things up in a prompt and deterministic way, but there are already tools to
handle the cases where it doesn't (ResourceWarning, context managers, ...),
and the more we encourage people to implicitly rely on refcounting, the
harder it is to optimize the interpreter or use alternative language
implementations. Why are reference cycles a problem that needs solving?

Actually the bit that I find most confusing about Victor's story is, how
can a traceback frame keep a thread alive? Is the problem that "dangling
thread" warnings are being triggered by threads that are finished and dead
but their Thread objects are still allocated? Because if so then that seems
like a bug in the warnings mechanism; there's no harm in a dead Thread
hanging around until collected, and Victor may have wasted a day debugging
an issue that wasn't a problem in the first place...

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Nathaniel Smith
On Mon, Sep 18, 2017 at 9:50 AM, Antoine Pitrou  wrote:
> On Mon, 18 Sep 2017 09:42:45 -0700
> Nathaniel Smith  wrote:
>>
>> Obviously it's nice when the refcount system is able to implicitly clean
>> things up in a prompt and deterministic way, but there are already tools to
>> handle the cases where it doesn't (ResourceWarning, context managers, ...),
>> and the more we encourage people to implicitly rely on refcounting, [...]
>
> The thing is, we don't need to encourage them.  Having objects disposed
> of when the last visible reference vanishes is a pretty common
> expectation people have when using CPython.
>
>> Why are reference cycles a problem that needs solving?
>
> Because sometimes they are holding up costly resources in memory when
> people don't expect them to.  Such as large Numpy arrays :-)

Do we have any reason to believe that this is actually happening on a
regular basis though?

If it is then it might make sense to look at the cycle collection
heuristics; IIRC they're based on a fairly naive count of how many
allocations have been made, without regard to their size.

> And, no, there are no obvious ways to fix for users.  gc.collect()
> is much too costly to be invoked on a regular basis.
>
>> Because if so then that seems
>> like a bug in the warnings mechanism; there's no harm in a dead Thread
>> hanging around until collected, and Victor may have wasted a day debugging
>> an issue that wasn't a problem in the first place...
>
> Yes, I think Victor is getting a bit overboard with the so-called
> "dangling thread" issue.  But the underlying issue (that heavyweight
> resources can be inadvertently held up in memory up just because some
> unrelated exception was caught and silenced along the way) is a real
> one.

Simply catching and silencing exceptions doesn't create any loops -- if you do

try:
raise ValueError
except ValueError as exc:
raise exc

then there's no loop, because the 'exc' local gets cleared as soon as
you exit the except: block. The issue that Victor ran into with
socket.create_connection is a special case where that function saves
off the caught exception to use later.

If someone wanted to replace socket.create_connection's 'raise err' with

try:
raise err
finally:
del err

then I guess that would be pretty harmless...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Nathaniel Smith
On Mon, Sep 18, 2017 at 10:59 AM, Antoine Pitrou  wrote:
> Le 18/09/2017 à 19:53, Nathaniel Smith a écrit :
>>>
>>>> Why are reference cycles a problem that needs solving?
>>>
>>> Because sometimes they are holding up costly resources in memory when
>>> people don't expect them to.  Such as large Numpy arrays :-)
>>
>> Do we have any reason to believe that this is actually happening on a
>> regular basis though?
>
> Define "regular" :-)  We did get some reports on dask/distributed about it.

Caused by uncollected cycles involving tracebacks? I looked here:

  
https://github.com/dask/distributed/issues?utf8=%E2%9C%93&q=is%3Aissue%20memory%20leak

and saw some issues with cycles causing delayed collection (e.g. #956)
or the classic memory leak problem of explicitly holding onto data you
don't need any more (e.g. #1209, bpo-29861), but nothing involving
traceback cycles. It was just a quick skim though.

>> If it is then it might make sense to look at the cycle collection
>> heuristics; IIRC they're based on a fairly naive count of how many
>> allocations have been made, without regard to their size.
>
> Yes... But just because a lot of memory has been allocated isn't a good
> enough heuristic to launch a GC collection.

I'm not an expert on GC at all, but intuitively it sure seems like
allocation size might be a useful piece of information to feed into a
heuristic. Our current heuristic is just, run a small collection after
every 700 allocations, run a larger collection after 10 smaller
collections.

> What if that memory is
> gonna stay allocated for a long time?  Then you're frequently launching
> GC runs for no tangible result except more CPU consumption and frequent
> pauses.

Every heuristic has problematic cases, that's why we call it a
heuristic :-). But somehow every other GC language manages to do
well-enough without refcounting... I think they mostly have more
sophisticated heuristics than CPython, though. Off the top of my head,
I know PyPy's heuristic involves the ratio of the size of nursery
objects versus the size of the heap, and JVMs do much cleverer things
like auto-tuning nursery size to make empirical pause times match some
target.

> Perhaps we could special-case tracebacks somehow, flag when a traceback
> remains alive after the implicit "del" clause at the end of an "except"
> block, then maintain some kind of linked list of the flagged tracebacks
> and launch specialized GC runs to find cycles accross that collection.
> That sounds quite involved, though.

We already keep a list of recently allocated objects and have a
specialized GC that runs across just that collection. That's what
generational GC is :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-25 Thread Nathaniel Smith
On Sat, Sep 23, 2017 at 2:45 AM, Antoine Pitrou  wrote:
>> As to "running_interpreters()" and "idle_interpreters()", I'm not sure
>> what the benefit would be.  You can compose either list manually with
>> a simple comprehension:
>>
>> [interp for interp in interpreters.list_all() if interp.is_running()]
>> [interp for interp in interpreters.list_all() if not interp.is_running()]
>
> There is a inherit race condition in doing that, at least if
> interpreters are running in multiple threads (which I assume is going
> to be the overly dominant usage model).  That is why I'm proposing all
> three variants.

There's a race condition no matter what the API looks like -- having a
dedicated running_interpreters() lets you guarantee that the returned
list describes the set of interpreters that were running at some
moment in time, but you don't know when that moment was and by the
time you get the list, it's already out-of-date. So this doesn't seem
very useful. OTOH if we think that invariants like this are useful, we
might also want to guarantee that calling running_interpreters() and
idle_interpreters() gives two lists such that each interpreter appears
in exactly one of them, but that's impossible with this API; it'd
require a single function that returns both lists.

What problem are you trying to solve?

>> Likewise,
>> queue.Queue.send() supports blocking, in addition to providing a
>> put_nowait() method.
>
> queue.Queue.put() never blocks in the usual case (*), which is of an
> unbounded queue.  Only bounded queues (created with an explicit
> non-zero max_size parameter) can block in Queue.put().
>
> (*) and therefore also never deadlocks :-)

Unbounded queues also introduce unbounded latency and memory usage in
realistic situations. (E.g. a producer/consumer setup where the
producer runs faster than the consumer.) There's a reason why sockets
always have bounded buffers -- it's sometimes painful, but the pain is
intrinsic to building distributed systems, and unbounded buffers just
paper over it.

>> > send() blocking until someone else calls recv() is not only bad for
>> > performance,
>>
>> What is the performance problem?
>
> Intuitively, there must be some kind of context switch (interpreter
> switch?) at each send() call to let the other end receive the data,
> since you don't have any internal buffering.

Technically you just need the other end to wake up at some time in
between any two calls to send(), and if there's no GIL then this
doesn't necessarily require a context switch.

> Also, suddenly an interpreter's ability to exploit CPU time is
> dependent on another interpreter's ability to consume data in a timely
> manner (what if the other interpreter is e.g. stuck on some disk I/O?).
> IMHO it would be better not to have such coupling.

A small buffer probably is useful in some cases, yeah -- basically
enough to smooth out scheduler jitter.

>> > it also increases the likelihood of deadlocks.
>>
>> How much of a problem will deadlocks be in practice?
>
> I expect more often than expected, in complex systems :-)  For example,
> you could have a recv() loop that also from time to time send()s some
> data on another queue, depending on what is received.  But if that
> send()'s recipient also has the same structure (a recv() loop which
> send()s from time to time), then it's easy to imagine to two getting in
> a deadlock.

You kind of want to be able to create deadlocks, since the alternative
is processes that can't coordinate and end up stuck in livelocks or
with unbounded memory use etc.

>> I'm not sure I understand your concern here.  Perhaps I used the word
>> "sharing" too ambiguously?  By "sharing" I mean that the two actors
>> have read access to something that at least one of them can modify.
>> If they both only have read-only access then it's effectively the same
>> as if they are not sharing.
>
> Right.  What I mean is that you *can* share very simple "data" under
> the form of synchronization primitives.  You may want to synchronize
> your interpreters even they don't share user-visible memory areas.  The
> point of synchronization is not only to avoid memory corruption but
> also to regulate and orchestrate processing amongst multiple workers
> (for example processes or interpreters).  For example, a semaphore is
> an easy way to implement "I want no more than N workers to do this
> thing at the same time" ("this thing" can be something such as disk
> I/O).

It's fairly reasonable to implement a mutex using a CSP-style
unbuffered channel (send = acquire, recei

Re: [Python-Dev] Investigating time for `import requests`

2017-10-01 Thread Nathaniel Smith
On Sun, Oct 1, 2017 at 7:04 PM, INADA Naoki  wrote:
> 4. http.client
>
> import time:  1376 |   2448 |   email.header
> ...
> import time:  1469 |   7791 |   email.utils
> import time:   408 |  10646 | email._policybase
> import time:   939 |  12210 |   email.feedparser
> import time:   322 |  12720 | email.parser
> ...
> import time:   599 |   1361 | email.message
> import time:  1162 |  16694 |   http.client
>
> email.parser has very large import tree.
> But I don't know how to break the tree.

There is some work to get urllib3/requests to stop using http.client,
though it's not clear if/when it will actually happen:
https://github.com/shazow/urllib3/pull/1068

> Another major slowness comes from compiling regular expression.
> I think we can increase cache size of `re.compile` and use ondemand cached
> compiling (e.g. `re.match()`),
> instead of "compile at import time" in many modules.

In principle re.compile() itself could be made lazy -- return a
regular exception object that just holds the string, and then compiles
and caches it the first time it's used. Might be tricky to do in a
backwards compatibility way if it moves detection of invalid regexes
from compile time to use time, but it could be an opt-in flag.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-15 Thread Nathaniel Smith
On Sun, Oct 15, 2017 at 6:33 PM, Yury Selivanov  wrote:
> Hi,
>
> It looks like the discussion about the execution context became
> extremely hard to follow.  There are many opinions on how the spec for
> generators should look like.  What seems to be "natural"
> behaviour/example to one, seems to be completely unreasonable to other
> people.  Recent emails from Guido indicate that he doesn't want to
> implement execution contexts for generators (at least in 3.7).
>
> In another thread Guido said this: "... Because coroutines and
> generators are similar under the covers, Yury demonstrated the issue
> with generators instead of coroutines (which are unfamiliar to many
> people). And then somehow we got hung up about fixing the problem in
> the example."
>
> And Guido is right.  My initial motivation to write PEP 550 was to
> solve my own pain point, have a solution for async code.
> 'threading.local' is completely unusable there, but complex code bases
> demand a working solution.  I thought that because coroutines and
> generators are so similar under the hood, I can design a simple
> solution that will cover all edge cases.  Turns out it is not possible
> to do it in one pass.
>
> Therefore, in order to make some progress, I propose to split the
> problem in half:
>
> Stage 1. A new execution context PEP to solve the problem *just for
> async code*.  The PEP will target Python 3.7 and completely ignore
> synchronous generators and asynchronous generators.  It will be based
> on PEP 550 v1 (no chained lookups, immutable mapping or CoW as an
> optimization) and borrow some good API decisions from PEP 550 v3+
> (contextvars module, ContextVar class).  The API (and C-API) will be
> designed to be future proof and ultimately allow transition to the
> stage 2.

If you want to ignore generators/async generators, then I think you
don't even want PEP 550 v1, you just want something like a
{set,get}_context_state API that lets you access the ThreadState's
context dict (or rather, an opaque ContextState object that holds the
context dict), and then task schedulers can call them at appropriate
moments.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-15 Thread Nathaniel Smith
On Sun, Oct 15, 2017 at 10:10 PM, Guido van Rossum  wrote:
> On Sun, Oct 15, 2017 at 8:17 PM, Nathaniel Smith  wrote:
>>
>> On Sun, Oct 15, 2017 at 6:33 PM, Yury Selivanov 
>> wrote:
>> > Stage 1. A new execution context PEP to solve the problem *just for
>> > async code*.  The PEP will target Python 3.7 and completely ignore
>> > synchronous generators and asynchronous generators.  It will be based
>> > on PEP 550 v1 (no chained lookups, immutable mapping or CoW as an
>> > optimization) and borrow some good API decisions from PEP 550 v3+
>> > (contextvars module, ContextVar class).  The API (and C-API) will be
>> > designed to be future proof and ultimately allow transition to the
>> > stage 2.
>>
>> If you want to ignore generators/async generators, then I think you
>> don't even want PEP 550 v1, you just want something like a
>> {set,get}_context_state API that lets you access the ThreadState's
>> context dict (or rather, an opaque ContextState object that holds the
>> context dict), and then task schedulers can call them at appropriate
>> moments.
>
>
> Yes, that's what I meant by "ignoring generators". And I'd like there to be
> a "current context" that's a per-thread MutableMapping with ContextVar keys.
> Maybe there's not much more to it apart from naming the APIs for getting and
> setting it? To be clear, I am fine with this being a specific subtype of
> MutableMapping. But I don't see much benefit in making it more abstract than
> that.

We don't need it to be abstract (it's fine to have a single concrete
mapping type that we always use internally), but I think we do want it
to be opaque (instead of exposing the MutableMapping interface, the
only way to get/set specific values should be through the ContextVar
interface). The advantages are:

- This allows C level caching of values in ContextVar objects (in
particular, funneling mutations through a limited API makes cache
invalidation *much* easier)

- It gives us flexibility to change the underlying data structure
without breaking API, or for different implementations to make
different choices -- in particular, it's not clear whether a dict or
HAMT is better, and it's not clear whether a regular dict or
WeakKeyDict is better.

The first point (caching) I think is the really compelling one: in
practice decimal and numpy are already using tricky caching code to
reduce the overhead of accessing the ThreadState dict, and this gets
even trickier with context-local state which has more cache
invalidation points, so if we don't do this in the interpreter then it
could actually become a blocker for adoption. OTOH it's easy for the
interpreter itself to do this caching, and it makes everyone faster.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-16 Thread Nathaniel Smith
On Mon, Oct 16, 2017 at 11:12 AM, Ethan Furman  wrote:
> What would be really nice is to have attribute access like thread locals.
> Instead of working with individual ContextVars you grab the LocalContext and
> access the vars as attributes.  I don't recall reading in the PEP why this
> is a bad idea.

You're mixing up levels -- the way threading.local objects work is
that there's one big dict that's hidden inside the interpreter (in the
ThreadState), and it holds a separate little dict for each
threading.local. The dict holding ContextVars is similar to the big
dict; a threading.local itself is like a ContextVar that holds a dict.
(And the reason it's this way is that it's easy to build either
version on top of the other, and we did some survey of threading.local
usage and the ContextVar style usage was simpler in the majority of
cases.)

For threading.local there's no way to get at the big dict at all from
Python; it's hidden inside the C APIs and threading internals. I'm
guessing you've never missed this :-). For ContextVars we can't hide
it that much, because async frameworks need to be able to swap the
current dict when switching tasks and clone it when starting a new
task, but those are the only absolutely necessary operations.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-16 Thread Nathaniel Smith
On Mon, Oct 16, 2017 at 8:49 AM, Guido van Rossum  wrote:
> On Sun, Oct 15, 2017 at 10:26 PM, Nathaniel Smith  wrote:
>>
>> On Sun, Oct 15, 2017 at 10:10 PM, Guido van Rossum 
>> wrote:
>> > Yes, that's what I meant by "ignoring generators". And I'd like there to
>> > be
>> > a "current context" that's a per-thread MutableMapping with ContextVar
>> > keys.
>> > Maybe there's not much more to it apart from naming the APIs for getting
>> > and
>> > setting it? To be clear, I am fine with this being a specific subtype of
>> > MutableMapping. But I don't see much benefit in making it more abstract
>> > than
>> > that.
>>
>> We don't need it to be abstract (it's fine to have a single concrete
>> mapping type that we always use internally), but I think we do want it
>> to be opaque (instead of exposing the MutableMapping interface, the
>> only way to get/set specific values should be through the ContextVar
>> interface). The advantages are:
>>
>> - This allows C level caching of values in ContextVar objects (in
>> particular, funneling mutations through a limited API makes cache
>> invalidation *much* easier)
>
>
> Well the MutableMapping could still be a proxy or something that invalidates
> the cache when mutated. That's why I said it should be a single concrete
> mapping type. (It also doesn't have to derive from MutableMapping -- it's
> sufficient for it to be a duck type for one, or perhaps some Python-level
> code could `register()` it.

MutableMapping is just a really complicated interface -- you have to
deal with iterator invalidation and popitem and implementing view
classes and all that. It seems like a lot of code for a feature that
no-one seems to worry about missing right now. (In fact, I suspect the
extra code required to implement the full MutableMapping interface on
top of a basic HAMT type is larger than the extra code to implement
the current PEP 550 draft's chaining semantics on top of this proposal
for a minimal PEP 550.)

What do you think of something like:

class Context:
def __init__(self, /, init: MutableMapping[ContextVar,object] = {}):
...

def as_dict(self) -> Dict[ContextVar, object]:
"Returns a snapshot of the internal state."

def copy(self) -> Context:
"Equivalent to (but maybe faster than) Context(self.as_dict())."

I like the idea of making it possible to set up arbitrary Contexts and
introspect them, because sometimes you do need to debug weird issues
or do some wacky stuff deep in the guts of a coroutine scheduler, but
this would give us that without implementing MutableMapping's 17
methods and 7 helper classes.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Timeout for PEP 550 / Execution Context discussion

2017-10-17 Thread Nathaniel Smith
On Oct 17, 2017 11:25 AM, "Guido van Rossum"  wrote:


In short, I really don't think there's a need for context variables to be
faster than instance variables.


There really is: currently the cost of looking up a thread local through
the C API is a dict lookup, which is faster than instance variable lookup,
and decimal and numpy have both found that that's already too expensive.

Or maybe you're just talking about the speed when the cache misses, in
which case never mind :-).

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 561: Distributing and Packaging Type Information

2017-10-26 Thread Ethan Smith
Hello all,

I have completed an implementation for PEP 561, and believe it is time to
share the PEP and implementation with python-dev

Python-ideas threads:
 * PEP 561: Distributing and Packaging Type Information
 * PEP 561 v2 - Packaging Static Type Information
 * PEP 561: Distributing Type Information V3

The live version is here: https://www.python.org/dev/peps/pep-0561/

As always, duplicated below.

Ethan Smith

---

PEP: 561
Title: Distributing and Packaging Type Information
Author: Ethan Smith 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 09-Sep-2017
Python-Version: 3.7
Post-History:


Abstract


PEP 484 introduced type hinting to Python, with goals of making typing
gradual and easy to adopt. Currently, typing information must be distributed
manually. This PEP provides a standardized means to package and distribute
type information and an ordering for type checkers to resolve modules and
collect this information for type checking using existing packaging
architecture.


Rationale
=

Currently, package authors wish to distribute code that has
inline type information. However, there is no standard method to distribute
packages with inline type annotations or syntax that can simultaneously
be used at runtime and in type checking. Additionally, if one wished to
ship typing information privately the only method would be via setting
``MYPYPATH`` or the equivalent to manually point to stubs. If the package
can be released publicly, it can be added to typeshed [1]_. However, this
does not scale and becomes a burden on the maintainers of typeshed.
Additionally, it ties bugfixes to releases of the tool using typeshed.

PEP 484 has a brief section on distributing typing information. In this
section [2]_ the PEP recommends using ``shared/typehints/pythonX.Y/`` for
shipping stub files. However, manually adding a path to stub files for each
third party library does not scale. The simplest approach people have taken
is to add ``site-packages`` to their ``MYPYPATH``, but this causes type
checkers to fail on packages that are highly dynamic (e.g. sqlalchemy
and Django).


Specification
=

There are several motivations and methods of supporting typing in a package.
This PEP recognizes three (3) types of packages that may be created:

1. The package maintainer would like to add type information inline.

2. The package maintainer would like to add type information via stubs.

3. A third party would like to share stub files for a package, but the
   maintainer does not want to include them in the source of the package.

This PEP aims to support these scenarios and make them simple to add to
packaging and deployment.

The two major parts of this specification are the packaging specifications
and the resolution order for resolving module type information. The packaging
spec is based on and extends PEP 345 metadata. The type checking spec is
meant to replace the ``shared/typehints/pythonX.Y/`` spec of PEP 484 [2]_.

New third party stub libraries are encouraged to distribute stubs via the
third party packaging proposed in this PEP in place of being added to
typeshed. Typeshed will remain in use, but if maintainers are found, third
party stubs in typeshed are encouraged to be split into their own package.

Packaging Type Information
--
In order to make packaging and distributing type information as simple and
easy as possible, the distribution of type information, and typed Python code
is done through existing packaging frameworks. This PEP adds a new item to the
``*.distinfo/METADATA`` file to contain metadata about a package's support for
typing. The new item is optional, but must have a name of ``Typed`` and have a
value of either ``inline`` or ``stubs``, if present.

Metadata Examples::

Typed: inline
Typed: stubs


Stub Only Packages
''''''''''''''''''

For package maintainers wishing to ship stub files containing all of their
type information, it is prefered that the ``*.pyi`` stubs are alongside the
corresponding ``*.py`` files. However, the stubs may be put in a sub-folder
of the Python sources, with the same name the ``*.py`` files are in. For
example, the ``flyingcircus`` package would have its stubs in the folder
``flyingcircus/flyingcircus/``. This path is chosen so that if stubs are
not found in ``flyingcircus/`` the type checker may treat the subdirectory as
a normal package. The normal resolution order of checking ``*.pyi`` before
``*.py`` will be maintained.

Third Party Stub Packages
'''''''''''''''''''''''''

Third parties seeking to distribute stub files are encouraged to contact the
maintainer of the package about distribution alongside the package

Re: [Python-Dev] PEP 561: Distributing and Packaging Type Information

2017-10-26 Thread Ethan Smith
On Thu, Oct 26, 2017 at 4:48 PM, Mariatta Wijaya 
wrote:

> Not sure if postings to python-ideas count,
>
>
> PEP 1 says:
>
> Post-History is used to record the dates of when new versions of the PEP
> are posted to python-list and/or python-dev.
>
> So, no ?
>


Reading PEP 12, https://www.python.org/dev/peps/pep-0012/#id24


   -

   Leave Post-History alone for now; you'll add dates to this header each
   time you post your PEP to python-l...@python.org or python-dev@python.org.
   If you posted your PEP to the lists on August 14, 2001 and September 3,
   2001, the Post-History header would look like:

   Post-History: 14-Aug-2001, 03-Sept-2001

   You must manually add new dates and check them in. If you don't have
   check-in privileges, send your changes to the PEP editors.

Perhaps it is outdated and needs to have python-ideas added? python-ideas
was created around 2006 (according to the archives), so after PEP 1/12 were
written.


> This PEP adds a new item to the
> ``*.distinfo/METADATA`` file

   *.dist-info/

Thank you for catching that. I will fix that with my next round of edits.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 561: Distributing and Packaging Type Information

2017-10-27 Thread Nathaniel Smith
On Thu, Oct 26, 2017 at 3:42 PM, Ethan Smith  wrote:
> However, the stubs may be put in a sub-folder
> of the Python sources, with the same name the ``*.py`` files are in. For
> example, the ``flyingcircus`` package would have its stubs in the folder
> ``flyingcircus/flyingcircus/``. This path is chosen so that if stubs are
> not found in ``flyingcircus/`` the type checker may treat the subdirectory as
> a normal package.

I admit that I find this aesthetically unpleasant. Wouldn't something
like __typestubs__/ be a more Pythonic name? (And also avoid potential
name clashes, e.g. my async_generator package has a top-level export
called async_generator; normally you do 'from async_generator import
async_generator'. I think that might cause problems if I created an
async_generator/async_generator/ directory, especially post-PEP 420.)

I also don't understand the given rationale -- it sounds like you want
to be able say well, if ${SOME_DIR_ON_PYTHONPATH}/flyingcircus/
doesn't contain stubs, then just stick the
${SOME_DIR_ON_PYTHONPATH}/flyingcircus/ directory *itself* onto
PYTHONPATH, and then try again. But that's clearly the wrong thing,
because then you'll also be adding a bunch of other random junk into
that directory into the top-level namespace. For example, suddenly the
flyingcircus.summarise_proust module has become a top-level
summarise_proust package. I must be misunderstanding something?

> Type Checker Module Resolution Order
> 
>
> The following is the order that type checkers supporting this PEP should
> resolve modules containing type information:
>
> 1. User code - the files the type checker is running on.
>
> 2. Stubs or Python source manually put in the beginning of the path. Type
>checkers should provide this to allow the user complete control of which
>stubs to use, and patch broken stubs/inline types from packages.
>
> 3. Third party stub packages - these packages can supersede the installed
>untyped packages. They can be found at ``pkg-stubs`` for package ``pkg``,
>however it is encouraged to check the package's metadata using packaging
>query APIs such as ``pkg_resources`` to assure that the package is meant
>for type checking, and is compatible with the installed version.

Am I right that this means you need to be able to map from import
names to distribution names? I.e., if you see 'import foo', you need
to figure out which *.dist-info directory contains metadata for the
'foo' package? How do you plan to do this?

The problem is that technically, import names and distribution names
are totally unrelated namespaces -- for example, the '_pytest' package
comes from the 'pytest' distribution, the 'pylab' package comes from
'matplotlib', and 'pip install scikit-learn' gives you a package
imported as 'sklearn'. Namespace packages are also challenging,
because a single top-level package might actually be spread across
multiple distributions.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 561: Distributing and Packaging Type Information

2017-10-29 Thread Ethan Smith
On Fri, Oct 27, 2017 at 12:44 AM, Nathaniel Smith  wrote:

> On Thu, Oct 26, 2017 at 3:42 PM, Ethan Smith  wrote:
> > However, the stubs may be put in a sub-folder
> > of the Python sources, with the same name the ``*.py`` files are in. For
> > example, the ``flyingcircus`` package would have its stubs in the folder
> > ``flyingcircus/flyingcircus/``. This path is chosen so that if stubs are
> > not found in ``flyingcircus/`` the type checker may treat the
> subdirectory as
> > a normal package.
>
> I admit that I find this aesthetically unpleasant. Wouldn't something
> like __typestubs__/ be a more Pythonic name? (And also avoid potential
> name clashes, e.g. my async_generator package has a top-level export
> called async_generator; normally you do 'from async_generator import
> async_generator'. I think that might cause problems if I created an
> async_generator/async_generator/ directory, especially post-PEP 420.)
>

I agree, this is unpleasant, I am now of the thought that if maintainers do
not wish to ship stubs alongside their Python code, they should just create
separate stub-only packages. I don't think there is a particular need to
special case this for minor convenience.



> > Type Checker Module Resolution Order
> > 
> >
> > The following is the order that type checkers supporting this PEP should
> > resolve modules containing type information:
> >
> > 1. User code - the files the type checker is running on.
> >
> > 2. Stubs or Python source manually put in the beginning of the path. Type
> >checkers should provide this to allow the user complete control of
> which
> >stubs to use, and patch broken stubs/inline types from packages.
> >
> > 3. Third party stub packages - these packages can supersede the installed
> >untyped packages. They can be found at ``pkg-stubs`` for package
> ``pkg``,
> >however it is encouraged to check the package's metadata using
> packaging
> >query APIs such as ``pkg_resources`` to assure that the package is
> meant
> >for type checking, and is compatible with the installed version.
>
> Am I right that this means you need to be able to map from import
> names to distribution names? I.e., if you see 'import foo', you need
> to figure out which *.dist-info directory contains metadata for the
> 'foo' package? How do you plan to do this?
>
>
The problem is that technically, import names and distribution names
> are totally unrelated namespaces -- for example, the '_pytest' package
> comes from the 'pytest' distribution, the 'pylab' package comes from
> 'matplotlib', and 'pip install scikit-learn' gives you a package
> imported as 'sklearn'. Namespace packages are also challenging,
> because a single top-level package might actually be spread across
> multiple distributions.
>
>
This is a problem. What I now realize is that the typing metadata is needed
for *packages* and not distributions. I will work on a new proposal that
makes the metadata per-package. It will require a slightly more complicated
proposal, but I feel that it is necessary. Thank you for realizing this
issue with my proposal, I probably should have caught it earlier.

-n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-11-05 Thread Nathaniel Smith
On Nov 5, 2017 2:41 PM, "Paul Ganssle"  wrote:

I think the question of whether any specific implementation of dict could
be made faster for a given architecture or even that the trade-offs made by
CPython are generally the right ones is kinda beside the point. It's
certainly feasible that an implementation that does not preserve ordering
could be better for some implementation of Python, and the question is
really how much is gained by changing the language semantics in such a way
as to cut off that possibility.


The language definition is not nothing, but I think it's easy to
overestimate its importance. CPython does in practice provide ordering
guarantees for dicts, and this solves a whole bunch of pain points: it
makes json roundtripping work better, it gives ordered kwargs, it makes it
possible for metaclasses to see the order class items were defined, etc.
And we got all these goodies for better-than-free: the new dict is faster
and uses less memory. So it seems very unlikely that CPython is going to
revert this change in the foreseeable future, and that means people will
write code that depends on this, and that means in practice reverting it
will become impossible due to backcompat and it will be important for other
interpreters to implement, regardless of what the language definition says.

That said, there are real benefits to putting this in the spec. Given that
we're not going to get rid of it, we might as well reward the minority of
programmers who are conscientious about following the spec by letting them
use it too. And there were multiple PEPs that went away when this was
merged; no one wants to resurrect them just for hypothetical future
implementations that may never exist. And putting it in the spec will mean
that we can stop having this argument over and over with the same points
rehashed for those who missed the last one. (This isn't aimed at you or
anything; it's not your fault you don't know all these arguments off the
top of your head, because how would you? But it is a reality of mailing
list dynamics that rehashing this kind of thing sucks up energy without
producing much.)

MicroPython deviates from the language spec in lots of ways. Hopefully this
won't need to be another one, but it won't be the end of the world if it is.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: go back to enabling DeprecationWarning by default

2017-11-06 Thread Nathaniel Smith
On Sun, Nov 5, 2017 at 9:38 PM, Nick Coghlan  wrote:
> We've been running the current experiment for 7 years, and the main
> observable outcome has been folks getting surprised by breaking
> changes in CPython releases, especially folks that primarily use
> Python interactively (e.g. for data analysis), or as a scripting
> engine (e.g. for systems administration).

It's also caused lots of projects to switch to using their own ad hoc
warning types for deprecations, e.g. off the top of my head:

https://github.com/matplotlib/matplotlib/blob/6c51037864f9a4ca816b68ede78207f7ecec656c/lib/matplotlib/cbook/deprecation.py#L5
https://github.com/numpy/numpy/blob/d75b86c0c49f7eb3ec60564c2e23b3ff237082a2/numpy/_globals.py#L45
https://github.com/python-trio/trio/blob/f50aa8e00c29c7f2953b7bad38afc620772dca74/trio/_deprecate.py#L16

So in some ways the change has actually made it *harder* for end-user
applications/scripts to hide all deprecation warnings, because for
each package you use you have to somehow figure out which
idiosyncratic type it uses, and filter them each separately.

(In any changes though please do keep in mind that Python itself is
not the only one issuing deprecation warnings. I'm thinking in
particular of the filter-based-on-Python-version idea. Maybe you could
have subclasses like Py35DeprecationWarning and filter on those?)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove typing from the stdlib

2017-11-06 Thread Ethan Smith
> Beyond the API already proposed in PEP 557, this would mean adding:
>
> * dataclasses.ClassVar (as proposed above)
> * dataclasses.Any (probably just set to the literal string
> "dataclasses.Any")
> * dataclasses.NamedTuple (as a replacement for typing.NamedTuple)
> * potentially dataclasses.is_class_var (instead of dataclasses
> implicitly snooping in sys.modules looking for a "typing" module)
>
> In effect, where we now have "typing" and "typing_extensions", we'd
> instead have "dataclasses" in the standard library (with the subset of
> annotations needed to define data record classes), and "typing" as an
> independently versioned module on PyPI.
>
>
I'm not so keen on this because I think some things in typing (such as
NamedTuple) probably deserve to be in the collections module. And some of
the ABCs could probably also be merged with collections.abc but doing this
correctly and not all at once would be quite difficult.
I do think the typing concepts should be better integrated into the
standard library. However, a fair amount of the clases you list (such as
NamedTuple) are in of themselves dependent on parts of typing.

Cheers,
Ethan

I think such an approach could address a few problems:
>
> - the useful-at-runtime concepts (in dataclasses) would be clearly
> separated from the static-type-analysis enablers (in typing)
> - type hints would still have a clear presence in the regular standard
> library, but the more complex parts would be opt-in (similar to
> statistics vs SciPy, secrets vs cryptography, etc)
> - as various concepts in typing matured and became genuinely stable,
> they could potentially "graduate" into the standard library's
> dataclasses module
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> ethan%40ethanhs.me
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: go back to enabling DeprecationWarning by default

2017-11-07 Thread Nathaniel Smith
On Nov 7, 2017 06:24, "Nick Coghlan"  wrote:

On 7 November 2017 at 19:30, Paul Moore  wrote:
> On 7 November 2017 at 04:09, Nick Coghlan  wrote:
>> Given the status quo, how do educators learn that the examples they're
>> teaching to their students are using deprecated APIs?
>
> By reading the documentation on what they are teaching, and by testing
> their examples with new versions with deprecation warnings turned on?
> Better than having warnings appear the first time they run a course
> with a new version of Python, surely?
>
> I understand the "but no-one actually does this" argument. And I
> understand that breakage as a result is worse than a few warnings. But
> enabling deprecation warnings by default feels to me like favouring
> the developer over the end user. I remember before the current
> behaviour was enabled and it was *immensely* frustrating to try to use
> 3rd party code and get a load of warnings. The only options were:
>
> 1. Report the bug - usually not much help, as I want to run the
> program *now*, not when a new release is made.
> 2. Fix the code (and ideally submit a PR upstream) - I want to *use*
> the program, not debug it.
> 3. Find the right setting/environment variable, and tweak how I call
> the program to apply it - which doesn't fix the root cause, it's just
> a workaround.

Yes, this is why I've come around to the view that we need to come up
with a viable definition of "third party code" and leave deprecation
warnings triggered by that code disabled by default.

My suggestion for that definition is to have the *default* meaning of
"third party code" be "everything that isn't __main__".


That way, if you get a deprecation warning at the REPL, it's
necessarily because of something *you* did, not because of something a
library you called did. Ditto for single file scripts.


IPython actually made this change a few years ago; since 2015 I think it
has shown DeprecationWarnings by default if they're triggered by __main__.

It's helpful but I haven't noticed it eliminating this problem. One
limitation in particular is that it requires that the warnings are
correctly attributed to the code that triggered them, which means that
whoever is issuing the warning has to set the stacklevel= correctly, and
most people don't. (The default of stacklevel=1 is always wrong for
DeprecationWarning.) Also, IIRC it's actually impossible to set the
stacklevel= correctly when you're deprecating a whole module and issue the
warning at import time, because you need to know how many stack frames the
import system uses.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Guarantee ordered dict literals in v3.7?

2017-11-07 Thread Nathaniel Smith
On Nov 7, 2017 12:02 PM, "Barry Warsaw"  wrote:

On Nov 7, 2017, at 09:39, Paul Sokolovsky  wrote:

> So, the problem is that there's no "Python language spec”.

There is a language specification: https://docs.python.org/3/refe
rence/index.html

But there are still corners that are undocumented, or topics that are
deliberately left as implementation details.


Also, specs don't mean that much unless there are multiple implementations
in widespread use. In JS the spec matters because it describes the common
subset of the language you can expect to see across browsers, and lets the
browser vendors coordinate on future changes. Since users actually target
and test against multiple implementations, this is useful. In python,
CPython's dominance means that most libraries are written against CPython's
behavior instead of the spec, and alternative implementations generally
don't care about the spec, they care about whether they can run the code
their users want to run. So PyPy has found that for their purposes, the
python spec includes all kinds of obscure internal implementation details
like CPython's static type/heap type distinction, the exact tricks CPython
uses to optimize local variable access, the CPython C API, etc. The Pyston
devs found that for their purposes, refcounting actually was a mandatory
part of the python language. Jython, MicroPython, etc make a different set
of compatibility tradeoffs again.

I'm not saying the spec is useless, but it's not magic either. It only
matters to the extent that it solves some problem for people.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The current dict is not an "OrderedDict"

2017-11-09 Thread Nathaniel Smith
On Thu, Nov 9, 2017 at 1:46 PM, Cameron Simpson  wrote:
> On 08Nov2017 10:28, Antoine Pitrou  wrote:
>>
>> On Wed, 8 Nov 2017 13:07:12 +1000
>> Nick Coghlan  wrote:
>>>
>>> On 8 November 2017 at 07:19, Evpok Padding 
>>> wrote:
>>> > On 7 November 2017 at 21:47, Chris Barker 
>>> > wrote:
>>> >> if dict order is preserved in cPython , people WILL count on it!
>>> >
>>> > I won't, and if people do and their code break, they'll have only
>>> > themselves
>>> > to blame.
>>> > Also, what proof do you have of that besides anecdotal evidence ?
>>>
>>> ~27 calendar years of anecdotal evidence across a multitude of CPython
>>> API behaviours (as well as API usage in other projects).
>>>
>>> Other implementation developers don't say "CPython's runtime behaviour
>>> is the real Python specification" for the fun of it - they say it
>>> because "my code works on CPython, but it does the wrong thing on your
>>> interpreter, so I'm going to stick with CPython" is a real barrier to
>>> end user adoption, no matter what the language specification says.
>>
>>
>> Yet, PyPy has no reference counting, and it doesn't seem to be a cause
>> of concern.  Broken code is fixed along the way, when people notice.
>
>
> I'd expect that this may be because that would merely to cause temporary
> memory leakage or differently timed running of __del__ actions.  Neither of
> which normally affects semantics critical to the end result of most
> programs.

It's actually a major problem when porting apps to PyPy. The common
case is servers that crash because they rely on the GC to close file
descriptors, and then run out of file descriptors. IIRC this is the
major obstacle to supporting OpenStack-on-PyPy. NumPy is currently
going through the process to deprecate and replace a core bit of API
[1] because it turns out to assume a refcounting GC.

-n

[1] See:
  https://github.com/numpy/numpy/pull/9639
  https://mail.python.org/pipermail/numpy-discussion/2017-November/077367.html

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python-committers] Enabling depreciation warnings feature code cutoff

2017-11-09 Thread Nathaniel Smith
On Nov 8, 2017 16:12, "Nick Coghlan"  wrote:

On 9 November 2017 at 07:46, Antoine Pitrou  wrote:
>
> Le 08/11/2017 à 22:43, Nick Coghlan a écrit :
>>
>> However, between them, the following two guidelines should provide
>> pretty good deprecation warning coverage for the world's Python code:
>>
>> 1. If it's in __main__, it will emit deprecation warnings at runtime
>> 2. If it's not in __main__, it should have a test suite
>
> Nick, have you actually read the discussion and the complaints people
> had with the current situation?  Most of them *don't* specifically talk
> about __main__ scripts.

I have, and I've also re-read the discussions regarding why the
default got changed in the first place.

Behaviour up until 2.6 & 3.1:

once::DeprecationWarning

Behaviour since 2.7 & 3.2:

ignore::DeprecationWarning

With test runners overriding the default filters to set it back to
"once::DeprecationWarning".


Is this intended to be a description of the current state of affairs?
Because I've never encountered a test runner that does this... Which
runners are you thinking of?


The rationale for that change was so that end users of applications
that merely happened to be written in Python wouldn't see deprecation
warnings when Linux distros (or the end user) updated to a new Python
version. It had the downside that you had to remember to opt-in to
deprecation warnings in order to see them, which is a problem if you
mostly use Python for ad hoc personal scripting.

Proposed behaviour for Python 3.7+:

once::DeprecationWarning:__main__
ignore::DeprecationWarning

With test runners still overriding the default filters to set them
back to "once::DeprecationWarning".

This is a partial reversion back to the pre-2.7 behaviour, focused
specifically on interactive use and ad hoc personal scripting. For ad
hoc *distributed* scripting, the changed default encourages upgrading
from single-file scripts to the zipapp model, and then minimising the
amount of code that runs directly in __main__.py.

I expect this will be a sufficient change to solve the specific
problem I'm personally concerned by, so I'm no longer inclined to
argue for anything more complicated. Other folks may have other
concerns that this tweak to the default filters doesn't address - they
can continue to build their case for more complex options using this
as the new baseline behaviour.


I think most people's concern is that we've gotten into a state where
DeprecationWarning's are largely useless in practice, because no one sees
them. Effectively the norm now is that developers (both the Python core
team and downstream libraries) think they're following some sensible
deprecation cycle, but often they're actually making changes without any
warning, just they wait a year to do it. It's not clear why we're bothering
through multiple releases -- which adds major overhead -- if in practice we
aren't going to actually warn most people. Enabling them for another 1% of
code doesn't really address this.

As I mentioned above, it's also having the paradoxical effect of making it
so that end-users are *more* likely to see deprecation warnings, since
major libraries are giving up on using DeprecationWarning. Most recently it
looks like pyca/cryptography is going to switch, partly as a result of this
thread:
  https://github.com/pyca/cryptography/pull/4014

Some more ideas to throw out there:

- if an envvar CI=true is set, then by default make deprecation warnings
into errors. (This is an informal standard that lots of CI systems use.
Error instead of "once" because most people don't look at CI output at all
unless there's an error.)

- provide some mechanism that makes it easy to have a deprecation warning
that starts out as invisible, but then becomes visible as you get closer to
the switchover point. (E.g. CPython might make the deprecation warnings
that it issues be invisible in 3.x.0 and 3.x.1 but become visible in
3.x.2+.) Maybe:

# in warnings.py
def deprecation_warning(library_version, visible_in_version,
change_in_version, msg, stacklevel):
...

Then a call like:

  deprecation_warning(my_library.__version__, "1.3", "1.4", "This function
is deprecated", 2)

issues an InvisibleDeprecationWarning if my_library.__version__ < 1.3, and
a VisibleDeprecationWarning otherwise.

(The stacklevel argument is mandatory because the usual default of 1 is
always wrong for deprecation warnings.)

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: go back to enabling DeprecationWarning by default

2017-11-10 Thread Nathaniel Smith
On Tue, Nov 7, 2017 at 8:45 AM, Nathaniel Smith  wrote:
> Also, IIRC it's actually impossible to set the stacklevel= correctly when
> you're deprecating a whole module and issue the warning at import time,
> because you need to know how many stack frames the import system uses.

Doh, I didn't remember correctly. Actually Brett fixed this in 3.5:
https://bugs.python.org/issue24305

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python-committers] Enabling depreciation warnings feature code cutoff

2017-11-11 Thread Nathaniel Smith
On Fri, Nov 10, 2017 at 11:34 PM, Brett Cannon  wrote:
> On Thu, Nov 9, 2017, 17:33 Nathaniel Smith,  wrote:
>> - if an envvar CI=true is set, then by default make deprecation warnings
>> into errors. (This is an informal standard that lots of CI systems use.
>> Error instead of "once" because most people don't look at CI output at all
>> unless there's an error.)
>
> One problem with that is I don't want e.g. mypy to start spewing out
> warnings while checking my code. That's why I like Victor's idea of a -X
> option that also flips on other test/debug features. Yes, this would also
> trigger for test runners, but that's at least a smaller amount of affected
> code.

Ah, yeah, you're right -- often CI systems use Python programs for
infrastructure, beyond the actual code under test. pip is maybe a more
obvious example than mypy -- we probably don't want pip to stop
working in CI runs just because it happens to use a deprecated API
somewhere :-). So this idea doesn't work.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 561 rework

2017-11-12 Thread Ethan Smith
Hello,

I re-wrote my PEP to have typing opt-in be per-package rather than
per-distribution. This greatly simplifies things, and thanks to the
feedback and suggestions of Nick Coghlan, it is entirely compatible with
older packaging tooling.

The main idea is there are two types of packages:
 - types are packaged with runtime code (inline or stubs in the same
package)
 - types are in a separate package (a third party or maintainer wants to
ship type information, but not with runtime code).

The PEP is live on python.org: https://www.python.org/dev/peps/pep-0561/

And as always, duplicated below.

Cheers,

Ethan Smith

---

PEP: 561
Title: Distributing and Packaging Type Information
Author: Ethan Smith 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 09-Sep-2017
Python-Version: 3.7
Post-History: 10-Sep-2017, 12-Sep-2017, 06-Oct-2017, 26-Oct-2017


Abstract


PEP 484 introduced type hinting to Python, with goals of making typing
gradual and easy to adopt. Currently, typing information must be distributed
manually. This PEP provides a standardized means to leverage existing tooling
to package and distribute type information with minimal work and an ordering
for type checkers to resolve modules and collect this information for type
checking.


Rationale
=

Currently, package authors wish to distribute code that has inline type
information. Additionally, maintainers would like to distribute stub files
to keep Python 2 compatibility while using newer annotation syntax. However,
there is no standard method to distribute packages with type information.
Also, if one wished to ship stub files privately the only method available
would be via setting ``MYPYPATH`` or the equivalent to manually point to
stubs. If the package can be released publicly, it can be added to
typeshed [1]_. However, this does not scale and becomes a burden on the
maintainers of typeshed. In addition, it ties bug fixes in stubs to releases
of the tool using typeshed.

PEP 484 has a brief section on distributing typing information. In this
section [2]_ the PEP recommends using ``shared/typehints/pythonX.Y/`` for
shipping stub files. However, manually adding a path to stub files for each
third party library does not scale. The simplest approach people have taken
is to add ``site-packages`` to their ``MYPYPATH``, but this causes type
checkers to fail on packages that are highly dynamic (e.g. sqlalchemy
and Django).


Definition of Terms
===

The definition of "MAY", "MUST", and "SHOULD", and "SHOULD NOT" are
to be interpreted as described in RFC 2119.

"inline" - the types are part of the runtime code using PEP 526 and 3107
syntax.

"stubs" - files containing only type information, empty of runtime code.

"Distributions" are the packaged files which are used to publish and distribute
a release. [3]_

"Module" a file containing Python runtime code or stubbed type information.

"Package" a directory or directories that namespace Python modules.


Specification
=

There are several motivations and methods of supporting typing in a package.
This PEP recognizes three (3) types of packages that users of typing wish to
create:

1. The package maintainer would like to add type information inline.

2. The package maintainer would like to add type information via stubs.

3. A third party or package maintainer would like to share stub files for
   a package, but the maintainer does not want to include them in the source
   of the package.

This PEP aims to support these scenarios and make them simple to add to
packaging and deployment.

The two major parts of this specification are the packaging specifications
and the resolution order for resolving module type information. The type
checking spec is meant to replace the ``shared/typehints/pythonX.Y/`` spec
of PEP 484 [2]_.

New third party stub libraries SHOULD distribute stubs via the third party
packaging methods proposed in this PEP in place of being added to typeshed.
Typeshed will remain in use, but if maintainers are found, third party stubs
in typeshed MAY be split into their own package.


Packaging Type Information
--

In order to make packaging and distributing type information as simple and
easy as possible, packaging and distribution is done through existing
frameworks.

Package maintainers who wish to support type checking of their code MUST add
a ``py.typed`` file to their package supporting typing. This marker is
recursive, if a top-level package includes it, all sub-packages MUST support
type checking as well. To have this file installed with the package,
maintainers can use existing packaging options such as ``package_data`` in
distutils, shown below.

Distutils option example::

...
package_data = {
'pkg': ['py.typed'],
},

Re: [Python-Dev] PEP 561 rework

2017-11-12 Thread Ethan Smith
On Sun, Nov 12, 2017 at 9:53 AM, Jelle Zijlstra 
wrote:

>
>
> 2017-11-12 3:40 GMT-08:00 Ethan Smith :
>
>> Hello,
>>
>> I re-wrote my PEP to have typing opt-in be per-package rather than
>> per-distribution. This greatly simplifies things, and thanks to the
>> feedback and suggestions of Nick Coghlan, it is entirely compatible with
>> older packaging tooling.
>>
>> The main idea is there are two types of packages:
>>  - types are packaged with runtime code (inline or stubs in the same
>> package)
>>  - types are in a separate package (a third party or maintainer wants to
>> ship type information, but not with runtime code).
>>
>> The PEP is live on python.org: https://www.python.org/dev/peps/pep-0561/
>>
>> And as always, duplicated below.
>>
>> Cheers,
>>
>> Ethan Smith
>>
>> ---
>>
>> PEP: 561
>> Title: Distributing and Packaging Type Information
>> Author: Ethan Smith 
>> Status: Draft
>> Type: Standards Track
>> Content-Type: text/x-rst
>> Created: 09-Sep-2017
>> Python-Version: 3.7
>> Post-History: 10-Sep-2017, 12-Sep-2017, 06-Oct-2017, 26-Oct-2017
>>
>>
>> Abstract
>> 
>>
>> PEP 484 introduced type hinting to Python, with goals of making typing
>> gradual and easy to adopt. Currently, typing information must be distributed
>> manually. This PEP provides a standardized means to leverage existing tooling
>> to package and distribute type information with minimal work and an ordering
>> for type checkers to resolve modules and collect this information for type
>> checking.
>>
>>
>> Rationale
>> =
>>
>> Currently, package authors wish to distribute code that has inline type
>> information. Additionally, maintainers would like to distribute stub files
>> to keep Python 2 compatibility while using newer annotation syntax. However,
>> there is no standard method to distribute packages with type information.
>> Also, if one wished to ship stub files privately the only method available
>> would be via setting ``MYPYPATH`` or the equivalent to manually point to
>> stubs. If the package can be released publicly, it can be added to
>> typeshed [1]_. However, this does not scale and becomes a burden on the
>> maintainers of typeshed. In addition, it ties bug fixes in stubs to releases
>> of the tool using typeshed.
>>
>> PEP 484 has a brief section on distributing typing information. In this
>> section [2]_ the PEP recommends using ``shared/typehints/pythonX.Y/`` for
>> shipping stub files. However, manually adding a path to stub files for each
>> third party library does not scale. The simplest approach people have taken
>> is to add ``site-packages`` to their ``MYPYPATH``, but this causes type
>> checkers to fail on packages that are highly dynamic (e.g. sqlalchemy
>> and Django).
>>
>>
>> Definition of Terms
>> ===
>>
>> The definition of "MAY", "MUST", and "SHOULD", and "SHOULD NOT" are
>> to be interpreted as described in RFC 2119.
>>
>> "inline" - the types are part of the runtime code using PEP 526 and 3107
>> syntax.
>>
>> "stubs" - files containing only type information, empty of runtime code.
>>
>> "Distributions" are the packaged files which are used to publish and 
>> distribute
>> a release. [3]_
>>
>> "Module" a file containing Python runtime code or stubbed type information.
>>
>> "Package" a directory or directories that namespace Python modules.
>>
>>
>> Specification
>> =
>>
>> There are several motivations and methods of supporting typing in a package.
>> This PEP recognizes three (3) types of packages that users of typing wish to
>> create:
>>
>> 1. The package maintainer would like to add type information inline.
>>
>> 2. The package maintainer would like to add type information via stubs.
>>
>> 3. A third party or package maintainer would like to share stub files for
>>a package, but the maintainer does not want to include them in the source
>>of the package.
>>
>> This PEP aims to support these scenarios and make them simple to add to
>> packaging and deployment.
>>
>> The two major parts of this specification are the packaging specifications
>> and the resolution order for resolving module type information. The type
>> checking spec is meant

Re: [Python-Dev] PEP 565: Show DeprecationWarning in __main__

2017-11-12 Thread Nathaniel Smith
On Sun, Nov 12, 2017 at 1:24 AM, Nick Coghlan  wrote:
> This change will lead to DeprecationWarning being displayed by default for:
>
> * code executed directly at the interactive prompt
> * code executed directly as part of a single-file script

Technically it's orthogonal, but if you're trying to get better
warnings in the REPL, then you might also want to look at:

https://bugs.python.org/issue1539925
https://github.com/ipython/ipython/issues/6611

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 561 rework

2017-11-12 Thread Nathaniel Smith
On Sun, Nov 12, 2017 at 11:21 AM, Ethan Smith  wrote:
>
>
> On Sun, Nov 12, 2017 at 9:53 AM, Jelle Zijlstra 
> wrote:
>>
>> 2017-11-12 3:40 GMT-08:00 Ethan Smith :
>>> The name of the stub
>>> package
>>> MUST follow the scheme ``pkg_stubs`` for type stubs for the package named
>>> ``pkg``. The normal resolution order of checking ``*.pyi`` before
>>> ``*.py``
>>> will be maintained.
>>
>> This is very minor, but what do you think of using "pkg-stubs" instead of
>> "pkg_stubs" (using a hyphen rather than an underscore)? This would make the
>> name illegal to import as a normal Python package, which makes it clear that
>> it's not a normal package. Also, there could be real packages named
>> "_stubs".
>
> I suppose this makes sense. I checked PyPI and as of a few weeks ago there
> were no packages with the name pattern, but I like the idea of making it
> explicitly non-runtime importable. I cannot think of any reason not to do
> it, and the avoidance of confusion about the package being importable is a
> benefit. I will make the change with my next round of edits.

PyPI doesn't distinguish between the names 'foo-stubs' and 'foo_stubs'
-- they get normalized together. So even if you use 'foo-stubs' as the
directory name on sys.path to avoid collisions at import time, it
still won't allow someone to distribute a separate 'foo_stubs' package
on PyPI.

If you do go with a fixed naming convention like this, the PEP should
probably also instruct the PyPI maintainers that whoever owns 'foo'
automatically has the right to control the name 'foo-stubs' as well.
Or maybe some tweak to PEP 541 is needed.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 561 rework

2017-11-12 Thread Ethan Smith
On Sun, Nov 12, 2017 at 8:07 PM, Nathaniel Smith  wrote:

> On Sun, Nov 12, 2017 at 11:21 AM, Ethan Smith  wrote:
> >
> >
> > On Sun, Nov 12, 2017 at 9:53 AM, Jelle Zijlstra <
> jelle.zijls...@gmail.com>
> > wrote:
> >>
> >> 2017-11-12 3:40 GMT-08:00 Ethan Smith :
> >>> The name of the stub
> >>> package
> >>> MUST follow the scheme ``pkg_stubs`` for type stubs for the package
> named
> >>> ``pkg``. The normal resolution order of checking ``*.pyi`` before
> >>> ``*.py``
> >>> will be maintained.
> >>
> >> This is very minor, but what do you think of using "pkg-stubs" instead
> of
> >> "pkg_stubs" (using a hyphen rather than an underscore)? This would make
> the
> >> name illegal to import as a normal Python package, which makes it clear
> that
> >> it's not a normal package. Also, there could be real packages named
> >> "_stubs".
> >
> > I suppose this makes sense. I checked PyPI and as of a few weeks ago
> there
> > were no packages with the name pattern, but I like the idea of making it
> > explicitly non-runtime importable. I cannot think of any reason not to do
> > it, and the avoidance of confusion about the package being importable is
> a
> > benefit. I will make the change with my next round of edits.
>
> PyPI doesn't distinguish between the names 'foo-stubs' and 'foo_stubs'
> -- they get normalized together. So even if you use 'foo-stubs' as the
> directory name on sys.path to avoid collisions at import time, it
> still won't allow someone to distribute a separate 'foo_stubs' package
> on PyPI.
>
> If you do go with a fixed naming convention like this, the PEP should
> probably also instruct the PyPI maintainers that whoever owns 'foo'
> automatically has the right to control the name 'foo-stubs' as well.
> Or maybe some tweak to PEP 541 is needed.
>

As I understand it however, the distribution name need not map to to the
package name in any way. So regardless of if foo-stubs is seen as
foo_stubs, I could name the distribution Albatross if I wished, and install
the foo-stubs package into site/dist-packages, and it would work. Also I'm
not sure if the PyPI change would require an edict from a PEP, but if so, I
wouldn't be opposed to the idea, I think it would be nice to default the
stub packages to the owners of the normal packages (people should, to my
understanding, be able to make alternate distributions without hassle).

Ethan

>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Standardise the AST (Re: PEP 563: Postponed Evaluation of Annotations)

2017-11-13 Thread Nathaniel Smith
Can you give any examples of problems caused by the ast not being
standardized? The original motivation of being able to distinguish between
  foo(x: int)
  foo(x: "int")
isn't very compelling – it's not clear it's a problem in the first place,
and even if it is then all we need is some kind of boolean flag, not an ast
standard.

On Nov 13, 2017 13:38, "Greg Ewing"  wrote:

> Guido van Rossum wrote:
>
>> But Python's syntax changes in nearly every release.
>>
>
> The changes are almost always additions, so there's no
> reason why the AST can't remain backwards compatible.
>
> the AST level ... elides many details (such as whitespace and parentheses).
>>
>
> That's okay, because the AST is only expected to
> represent the semantics of Python code, not its
> exact lexical representation in the source. It's
> the same with Lisp -- comments and whitespace have
> been stripped out by the time you get to Lisp
> data.
>
> Lisp had almost no syntax so I presume the mapping to data structures was
>> nearly trivial compared to Python.
>>
>
> Yes, the Python AST is more complicated, but we
> already have that much complexity in the AST being
> used by the compiler.
>
> If I understand correctly, we also have a process
> for converting that internal structure to and from
> an equally complicated set of Python objects, that
> isn't needed by the compiler and exists purely for
> the convenience of Python code.
>
> I can't see much complexity being added if we were
> to decide to standardise the Python representation.
>
> --
> Greg
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%
> 40pobox.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 565: Show DeprecationWarning in __main__

2017-11-13 Thread Nathaniel Smith
On Mon, Nov 13, 2017 at 6:09 AM, Serhiy Storchaka  wrote:
> 13.11.17 14:29, Antoine Pitrou пише:
>>
>> On Mon, 13 Nov 2017 22:37:46 +1100
>> Chris Angelico  wrote:
>>>
>>> On Mon, Nov 13, 2017 at 9:46 PM, Antoine Pitrou 
>>> wrote:
>>>>
>>>> On Sun, 12 Nov 2017 19:48:28 -0800
>>>> Nathaniel Smith  wrote:
>>>>>
>>>>> On Sun, Nov 12, 2017 at 1:24 AM, Nick Coghlan 
>>>>> wrote:
>>>>>>
>>>>>> This change will lead to DeprecationWarning being displayed by default
>>>>>> for:
>>>>>>
>>>>>> * code executed directly at the interactive prompt
>>>>>> * code executed directly as part of a single-file script
>>>>>
>>>>>
>>>>> Technically it's orthogonal, but if you're trying to get better
>>>>> warnings in the REPL, then you might also want to look at:
>>>>>
>>>>> https://bugs.python.org/issue1539925
>>>>> https://github.com/ipython/ipython/issues/6611
>>>>
>>>>
>>>> Depends what you call "better".  Personally, I don't want to see
>>>> warnings each and every time I use a deprecated or questionable
>>>> construct or API from the REPL.
>>>
>>>
>>> Isn't that the entire *point* of warnings? When you're working at the
>>> REPL, you're the one in control of which APIs you use, so you should
>>> be the one to know about deprecations.
>>
>>
>> If I see a warning once every REPL session, I know about the deprecation
>> already, thank you.  I don't need to be taken by the hand like a little
>> child.  Besides, the code I write in the REPL is not meant for durable
>> use.
>
>
> Hmm, now I see that the simple Nathaniel's solution is not completely
> correct. If the warning action is 'module', it should be emitted only once
> if used directly in the REPL, because '__main__' is the same module.

True. The fundamental problem is that generally, Python uses
(filename, lineno) pairs to identify lines of code. But (a) the
warning module assumes that for each namespace dict, there is a unique
mapping between line numbers and lines of code, so it ignores filename
and just keys off lineno, and (b) the REPL re-uses the same (file,
lineno) for different lines of code anyway.

So I guess the fully correct solution would be to use a unique
"filename" when compiling each block of code -- e.g. the REPL could do
the equivalent of compile(, "REPL[1]", ...) for the first line,
compile(, "REPL[2]", ...) for the second line, etc. -- and then
also teach the warnings module's duplicate detection logic to key off
of (file, lineno) pairs instead of just lineno.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 561 rework

2017-11-13 Thread Ethan Smith
On Mon, Nov 13, 2017 at 3:50 PM, Sebastian Rittau 
wrote:

> Hello everyone,
>
>
> Am 14.11.2017 um 00:29 schrieb Guido van Rossum:
>
>> This is a nice piece of work. I expect to accept it pretty much verbatim
>> (with some small edits, see https://github.com/python/peps/pull/467). I
>> agree with Nick that we don't have to do anything specifically about
>> control of foo_stubs packages -- nor do I think we need to worry about
>> foo_stubs vs. foo-stubs.
>>
>> Everyone else: if you think this should not go through, now's the time to
>> reply-all here!
>>
> I am really looking forward to the implementation of this PEP and I am
> glad that it is close to acceptance. One thing that is not really clear to
> me is how module-only packages are handled. Say I have a package "foo" that
> installs the file "foo.py" to site-packages, where would I install
> "foo.pyi" and py.typed to? Or is this case not supported and I have to
> convert the foo module into a package containing just __init__.py?
>
> The PEP as of right now provides no support for module only distributions.
I don't think it would be possible to support inline typing in module only
distributions, as a random mymod.py in site/dist-packages is hard to trust.
Refactoring into a package is probably the best option. Alternatively, I
believe a mymod_stubs with just an __init__.pyi might work as a work around
as well.

Ethan


>  - Sebastian
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/ethan%
> 40ethanhs.me
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 561 rework

2017-11-14 Thread Ethan Smith
A note was added [1] about the solution for module only distributions and
is live on Python.org.

[1] https://github.com/python/peps/pull/468

Ethan Smith

On Tue, Nov 14, 2017 at 1:02 AM, Sebastian Rittau 
wrote:

> Am 14.11.2017 um 02:38 schrieb Guido van Rossum:
>
> On Mon, Nov 13, 2017 at 3:50 PM, Sebastian Rittau 
> wrote:
>
>>
>
>> I am really looking forward to the implementation of this PEP and I am
>> glad that it is close to acceptance. One thing that is not really clear to
>> me is how module-only packages are handled. Say I have a package "foo" that
>> installs the file "foo.py" to site-packages, where would I install
>> "foo.pyi" and py.typed to? Or is this case not supported and I have to
>> convert the foo module into a package containing just __init__.py?
>>
>
> Good call. I think that conversion to a package is indeed the best
> approach -- it doesn't seem worth it to add more special-casing for this
> scenario.
>
> Ethan, if you agree, you should just add a sentence about this to the PEP.
>
> This seems like the best solution, especially since setuptools does not
> really support installing package data for pure modules.
>
>  - Sebastian
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> ethan%40ethanhs.me
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] module customization

2017-11-15 Thread Nathaniel Smith
On Wed, Nov 15, 2017 at 4:27 PM, Ethan Furman  wrote:
> The second way is fairly similar, but instead of replacing the entire
> sys.modules entry, its class is updated to be the class just created --
> something like sys.modules['mymod'].__class__ = MyNewClass .
>
> My request:  Can someone write a better example of the second method?  And
> include __getattr__ ?

Here's a fairly straightforward example:

https://github.com/python-trio/trio/blob/master/trio/_deprecate.py#L114-L140

(Intentionally doesn't include __dir__ because I didn't want
deprecated attributes to show up in tab completion. For other use
cases like lazy imports, you would implement __dir__ too.)

Example usage:

https://github.com/python-trio/trio/blob/master/trio/__init__.py#L66-L98

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] module customization

2017-11-15 Thread Nathaniel Smith
On Wed, Nov 15, 2017 at 5:49 PM, Guido van Rossum  wrote:
>> If not, why not, and if so, shouldn't PEP 562's __getattr__ also take a
>> 'self'?
>
> Not really, since there's only one module (the one containing the
> __getattr__ function). Plus we already have a 1-argument module-level
> __getattr__ in mypy. See PEP 484.

I guess the benefit of taking 'self' would be that it would make it
possible (though still a bit odd-looking) to have reusable __getattr__
implementations, like:

# mymodule.py
from auto_importer import __getattr__, __dir__

auto_import_modules = {"foo", "bar"}

# auto_importer.py
def __getattr__(self, name):
    if name in self.auto_import_modules:
...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] module customization

2017-11-15 Thread Nathaniel Smith
On Wed, Nov 15, 2017 at 10:14 PM, Nathaniel Smith  wrote:
> On Wed, Nov 15, 2017 at 4:27 PM, Ethan Furman  wrote:
>> The second way is fairly similar, but instead of replacing the entire
>> sys.modules entry, its class is updated to be the class just created --
>> something like sys.modules['mymod'].__class__ = MyNewClass .
>>
>> My request:  Can someone write a better example of the second method?  And
>> include __getattr__ ?

Doh, I forgot to permalinkify those. Better links for anyone reading
this in the future:

> Here's a fairly straightforward example:
>
> https://github.com/python-trio/trio/blob/master/trio/_deprecate.py#L114-L140

https://github.com/python-trio/trio/blob/3edfafeedef4071646a9015e28be01f83dc02f94/trio/_deprecate.py#L114-L140

> (Intentionally doesn't include __dir__ because I didn't want
> deprecated attributes to show up in tab completion. For other use
> cases like lazy imports, you would implement __dir__ too.)
>
> Example usage:
>
> https://github.com/python-trio/trio/blob/master/trio/__init__.py#L66-L98

https://github.com/python-trio/trio/blob/3edfafeedef4071646a9015e28be01f83dc02f94/trio/__init__.py#L66-L98

> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 565: Show DeprecationWarning in __main__

2017-11-19 Thread Nathaniel Smith
On Sun, Nov 19, 2017 at 2:26 AM, Serhiy Storchaka  wrote:
> It seems to me that most of issues with FutureWarning on GitHub [1] are
> related to NumPy and pandas which use FutureWarning for its original nominal
> purpose, for warning about using programming interfaces that will change the
> behavior in future. This doesn't have any relation to end users unless the
> end user is an author of the written code.
>
> [1] https://github.com/search?q=FutureWarning&type=Issues

Eh, numpy does use FutureWarning for changes where the same code will
transition from doing one thing to doing something else without
passing through a state where it raises an error. But that decision
was based on FutureWarning being shown to users by default, not
because it matches the nominal purpose :-). IIRC I proposed this
policy for NumPy in the first place, and I still don't even know if it
matches the original intent because the docs are so vague. "Will
change behavior in the future" describes every case where you might
consider using FutureWarning *or* DeprecationWarning, right?

We have been using DeprecationWarning for changes where code will
transition from working -> raising an error, and that *is* based on
the Official Recommendation to hide those by default whenever
possible. We've been doing this for a few years now, and I'd say our
experience so far has been... poor. I'm trying to figure out how to
say this politely. Basically it doesn't work at all. What happens in
practice is that we issue a DeprecationWarning for a year, mostly
no-one notices, then we make the change in a 1.x.0 release, everyone's
code breaks, we roll it back in 1.x.1, and then possibly repeat
several times in 1.(x+1).0 and 1.(x+2).0 until enough people have
updated their code that the screams die down. I'm pretty sure we'll be
changing our policy at some point, possibly to always use
FutureWarning for everything.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tricky way of of creating a generator via a comprehension expression

2017-11-24 Thread Nathaniel Smith
On Fri, Nov 24, 2017 at 4:22 PM, Guido van Rossum  wrote:
> The more I hear about this topic, the more I think that `await`, `yield` and
> `yield from` should all be banned from occurring in all comprehensions and
> generator expressions. That's not much different from disallowing `return`
> or `break`.

I would say that banning `yield` and `yield from` is like banning
`return` and `break`, but banning `await` is like banning function
calls. There's no reason for most users to even know that `await` is
related to generators, so a rule disallowing it inside comprehensions
is just confusing. AFAICT 99% of the confusion around async/await is
because people think of them as being related to generators, when from
the user point of view it's not true at all and `await` is just a
funny function-call syntax.

Also, at the language level, there's a key difference between these
cases. A comprehension has implicit `yield`s in it, and then mixing in
explicit `yield`s as well obviously leads to confusion. But when you
use an `await` in a comprehension, that turns it into an async
generator expression (thanks to PEP 530), and in an async generator,
`yield` and `await` use two separate, unrelated channels. So there's
no confusion or problem with having `await` inside a comprehension.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tricky way of of creating a generator via a comprehension expression

2017-11-24 Thread Nathaniel Smith
On Fri, Nov 24, 2017 at 9:04 PM, Nick Coghlan  wrote:
> def example():
> comp1 = yield from [(yield x) for x in ('1st', '2nd')]
> comp2 = yield from [(yield x) for x in ('3rd', '4th')]
> return comp1, comp2

Isn't this a really confusing way of writing

def example():
return [(yield '1st'), (yield '2nd')], [(yield '3rd'), (yield '4th')]

?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tricky way of of creating a generator via a comprehension expression

2017-11-24 Thread Nathaniel Smith
On Fri, Nov 24, 2017 at 9:39 PM, Nick Coghlan  wrote:
> On 25 November 2017 at 15:27, Nathaniel Smith  wrote:
>> On Fri, Nov 24, 2017 at 9:04 PM, Nick Coghlan  wrote:
>>> def example():
>>> comp1 = yield from [(yield x) for x in ('1st', '2nd')]
>>> comp2 = yield from [(yield x) for x in ('3rd', '4th')]
>>> return comp1, comp2
>>
>> Isn't this a really confusing way of writing
>>
>> def example():
>> return [(yield '1st'), (yield '2nd')], [(yield '3rd'), (yield '4th')]
>
> A real use case

Do you have a real use case? This seems incredibly niche...

> wouldn't be iterating over hardcoded tuples in the
> comprehensions, it would be something more like:
>
> def example(iterable1, iterable2):
> comp1 = yield from [(yield x) for x in iterable1]
> comp2 = yield from [(yield x) for x in iterable2]
> return comp1, comp2

I submit that this would still be easier to understand if written out like:

def map_iterable_to_yield_values(iterable):
"Yield the values in iterable, then return a list of the values sent back."
result = []
for obj in iterable:
result.append(yield obj)
return result

def example(iterable1, iterable2):
values1 = yield from map_iterable_to_yield_values(iterable1)
values2 = yield from map_iterable_to_yield_values(iterable2)
return values1, values2

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   3   4   5   6   7   8   9   10   >