Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Nathaniel Smith
On Fri, Jun 2, 2017 at 1:29 PM, Terry Reedy  wrote:
> On 6/2/2017 12:21 PM, Barry Warsaw wrote:
>>
>> On Jun 03, 2017, at 02:10 AM, Nick Coghlan wrote:
>
>
>>> The benefit of making any backport a private API is that it would mean
>>> we weren't committing to support that API for general use: it would be
>>> supported *solely* for the use case discussed in the PEP (i.e. helping
>>> to advance the development of PEP 543 without breaking pip
>>> bootstrapping in the process).
>>
>>
>> That sounds like a good compromise.  My own major objection was in
>> exposing a
>> new public API in Python 2.7, which would clearly be a new feature.
>
>
> Which would likely be seen by someone as justifying other requests to add to
> 2.7 'just this one more essential new feature' ;-).

Whatever the eventual outcome, I don't think there's any danger
someone will read this thread and think "wow, it's so easy to get new
features into 2.7".

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 484 update proposal: annotating decorated declarations

2017-06-02 Thread Guido van Rossum
On Fri, Jun 2, 2017 at 1:07 PM, Koos Zevenhoven  wrote:

> [...]
> I suppose it is, especially because there seems to be nothing that
> prevents you from getting runtime annotations in the enclosing class/module
> ​:
>
>
> number: int
>
> @call
> def number():
> return 42
>

Well mypy actually gives an error for that, "Name 'number' already defined".


>
> But for functions one could have (
> ​using
>  the context manager example):
>
>
> def session(url: str) -> ContextManager[DatabaseSession]: ...
>
> @predeclared
> @contextmanager
> def session(url: str) -> Iterator[DatabaseSession]:
> s = DatabaseSession(url)
> try:
> yield s
> finally:
> s.close()
>
>
> This makes it clear that the function is declared elsewhere. But the
> `predeclared` decorator would need tricks like sys._getframe(1) to set
> session.__annotations__ according to the predeclaration.
>

I'm not excited about that.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 544: Protocols - second round

2017-06-02 Thread Ivan Levkivskyi
On 1 June 2017 at 00:10, Guido van Rossum  wrote:

>
> On Wed, May 31, 2017 at 2:16 AM, Ivan Levkivskyi 
> wrote:
>
>> On 31 May 2017 at 00:58, Guido van Rossum  wrote:
>> [...]
>>
>> Thank you for very detailed answers! I have practically nothing to add.
>> It seems to me that most of the Kevin's questions stem from unnecessary
>> focus
>> on runtime type checking. Here are two ideas about how to fix this:
>>
>> * Add the word "static" somewhere in the PEP title.
>>
>
> So the title could become "Protocols: Static structural subtyping (duck
> typing)" -- long, but not record-setting.
>

I am thinking about "Protocols: Structural subtyping (static duck typing)".
The reason is that subtyping is already a mostly static concept (in
contrast to subclassing),
while duck typing is typically associated with the runtime behaviour.

This might seem minor, but this version of the title sounds much more
naturally to me.

--
Ivan
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Terry Reedy

On 6/2/2017 12:21 PM, Barry Warsaw wrote:

On Jun 03, 2017, at 02:10 AM, Nick Coghlan wrote:



The benefit of making any backport a private API is that it would mean
we weren't committing to support that API for general use: it would be
supported *solely* for the use case discussed in the PEP (i.e. helping
to advance the development of PEP 543 without breaking pip
bootstrapping in the process).


That sounds like a good compromise.  My own major objection was in exposing a
new public API in Python 2.7, which would clearly be a new feature.


Which would likely be seen by someone as justifying other requests to 
add to 2.7 'just this one more essential new feature' ;-).



--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 484 update proposal: annotating decorated declarations

2017-06-02 Thread Koos Zevenhoven
On Fri, Jun 2, 2017 at 8:57 PM, Guido van Rossum  wrote:
> On Fri, Jun 2, 2017 at 9:41 AM, Koos Zevenhoven  wrote:
>>
>> I still don't understand what would happen with __annotations__. If
>> the decorator returns a non-function, one would expect the annotations
>> to be in the __annotations__ attribute of the enclosing class or
>> module. If it returns a function, they would be in the __annotations__
>> attribute of the function. And I'm talking about the runtime behavior
>> in Python as explained in PEP484 and PEP526. I would expect these
>> declarations to behave according to the same principles as other ways
>> to annotate variables/functions. If there is no runtime behavior, a
>> comment-based syntax might be more appropriate. Or have I missed
>> something?
>
>
> So when returning a function, the runtime version of the decorator can
> easily update the function's __annotations__. But when returning a
> non-function, the decorator would have a hard time updating
__annotations__
> of the containing class/module without "cheating" (e.g. sys._getframe()).
I
> think the latter is similar to e.g. attributes defined with @property --
> those don't end up in __annotations__ either. I think this is an
acceptable
> deficiency.
>

I suppose it is, especially because there seems to be nothing that prevents
you from getting runtime annotations in the enclosing class/module
​:


number: int

@call
def number():
return 42


But for functions one could have (
​using
 the context manager example):


def session(url: str) -> ContextManager[DatabaseSession]: ...

@predeclared
@contextmanager
def session(url: str) -> Iterator[DatabaseSession]:
s = DatabaseSession(url)
try:
yield s
finally:
s.close()


This makes it clear that the function is declared elsewhere. But the
`predeclared` decorator would need tricks like sys._getframe(1) to set
session.__annotations__ according to the predeclaration.

-- Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE

2017-06-02 Thread Larry Hastings


On 06/02/2017 02:38 AM, Antoine Pitrou wrote:

I hope those are not the actual numbers you're intending to use ;-)
I still think that allocating more than 1 or 2MB at once would be
foolish.  Remember this is data that's going to be carved up into
(tens of) thousands of small objects.  Large objects eschew the small
object allocator (not to mention that third-party libraries like Numpy
may be using different allocation routines when they allocate very
large data).


Honest, I'm well aware of what obmalloc does and how it works.  I bet 
I've spent more time crawling around in it in the last year than anybody 
else on the planet.  Mainly because it works so well for CPython, nobody 
else needed to bother!


I'm also aware, for example, that if your process grows to consume 
gigabytes of memory, you're going to have tens of thousands of allocated 
arenas.  The idea that on systems with gigabytes of memory--90%+? of 
current systems running CPython--we should allocate memory forever in 
256kb chunks is faintly ridiculous.  I agree that we should start small, 
and ramp up slowly, so Python continues to run well on small computers 
and not allocate tons of memory for small programs.  But I also think we 
should ramp up *ever*, for programs that use tens or hundreds of megabytes.


Also note that if we don't touch the allocated memory, smart modern OSes 
won't actually commit any resources to it.  All that happens when your 
process allocates 1GB is that the OS changes some integers around.  It 
doesn't actually commit any memory to your process until you attempt to 
write to that memory, at which point it gets mapped in in 
local-page-size chunks (4k? 8k? something in that neighborhood and 
power-of-2 sized).  So if we allocate 32mb, and only touch the first 
1mb, the other 31mb doesn't consume any real resources.  I was planning 
on making the multi-arena code only touch memory when it actually needs 
to, similarly to the way obmalloc lazily consumes memory inside an 
allocated pool (see the nextoffset field in pool_header), to take 
advantage of this ubiquitous behavior.



If I write this multi-arena code, which I might, I was thinking I'd try 
this approach:


 * leave arenas themselves at 256k
 * start with a 1MB multi-arena size
 * every time I allocate a new multi-arena, multiply the size of the
   next multi-arena by 1.5 (rounding up to 256k each time)
 * every time I free a multi-arena, divide the size of the next
   multi-arena by 2 (rounding up to 256k each time)
 * if allocation of a multi-arena fails, use a binary search algorithm
   to allocate the largest multi-arena possible (rounding up to 256k at
   each step)
 * cap the size of multi arenas at, let's say, 32mb

So multi-arenas would be 1mb, 1.5mb, 2.25mb, 3.5mb (round up!), etc.


Fun fact: Python allocates 16 arenas at the start of the program, just 
to initialize obmalloc.  That consumes 4mb of memory.  With the above 
multi-arena approach, that'd allocate the first three multi-arenas, 
pre-allocating 19 arenas, leaving 3 unused.  It's *mildly* tempting to 
make the first multi-arena be 4mb, just so this is exactly right-sized, 
but... naah.



//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE

2017-06-02 Thread Larry Hastings


On 06/02/2017 12:09 PM, Tim Peters wrote:
I should note that Py_ADDRESS_IN_RANGE is my code - this isn't a 
backhanded swipe at someone else.


One minor note.  During the development of 3.6, CPython started 
permitting some C99-isms, including static inline functions. 
Py_ADDRESS_IN_RANGE was therefore converted from a macro into a static 
inline function, and its new name is address_in_range.


Just so we're all on the same (4k!) page,


//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE

2017-06-02 Thread Larry Hastings



On 06/02/2017 02:46 AM, Victor Stinner wrote:

I would be curious of another test: use pymalloc for objects larger
than 512 bytes. For example, allocate up to 4 KB?

In the past, we already changed the maximum size from 256 to 512 to
support most common Python objects on 64-bit platforms. Since Python
objects contain many pointers: switching from 32 bit to 64 bit can
double the size of the object in the worst case.


You've already seen Tim Peters' post about why we must leave pool size 
set to 4k.  Obviously This in turn means using obmalloc for larger 
objects will mean more and more wasted memory.


For example, let's say we use obmalloc for allocations of 2048 bytes.  
Pool size is 4096 bytes, and there's a 48-byte "pool_header" structure 
on the front (on 64-bit platforms, if I counted right). So there are 
only 4048 bytes usable per pool.  After the first 2048 allocation, we're 
left with 2000 bytes at the end.  You can't use that memory for another 
allocation class, that's impossible given obmalloc's design.  So that 
2000 bytes is just wasted.


Currently obmalloc's maximum allocation size is 512 bytes; after 7 
allocations, this leaves 464 wasted bytes at the end.  Which isn't 
*great* exactly but it's only 11% of the overall allocated memory.


Anyway, I'm not super excited by the prospect of using obmalloc for 
larger objects.  There's an inverse relation between the size of 
allocation and the frequency of allocation.  In Python there are lots of 
tiny allocations, but fewer and fewer as the size increases.  (A 
similarly-shaped graph to what retailers call the "long tail".)  By no 
small coincidence, obmalloc is great at small objects, which is where we 
needed the help most.  Let's leave it at that.



A more fruitful endeavor might be to try one of these fancy new 
third-party allocators in CPython, e.g. tcmalloc, jemalloc.  Try each 
with both obmalloc turned on and turned off, and see what happens to 
performance and memory usage.  (I'd try it myself, but I'm already so 
far behind on watching funny cat videos.)



//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE

2017-06-02 Thread Tim Peters
[Tim]
>> While I would like to increase the pool size, it's fraught with
>> danger.

[Antoine Pitrou ]
> What would be the point of increasing the pool size?  Apart from being
> able to allocate 4KB objects out of it, I mean.
>
> Since 4KB+ objects are relatively uncommon (I mean we don't allocate
> hundreds of thousands of them per second), I don't think it's really
> worthwhile trying to have the small object allocator handle them.

I don't care about "large" objects here.  It's educated intuition
about speed:  so long as pymalloc is working within a pool, it's
blazing fast.  When it has to muck with a new pool, much slower
(compared to staying within a pool) code is required.  When it has to
muck with a new arena, slower still.

So the intuition is simple:  the larger a pool, the more object
operations it can handle staying within its best-case (fastest) code
paths.



>> It would be nice to find a different way for pymalloc to figure out
>> which addresses belong to it.  The excruciating Py_ADDRESS_IN_RANGE
>> manages to do it in small constant (independent of the number of
>> arenas and pools in use) time, which is its only virtue ;-)

> So, to sum it up, it's excruciating but fast and works reliably.  Why
> change it?

To enable using pools larger than the greatest common divisor of all
OS's native pool sizes.

There's also that Py_ADDRESS_IN_RANGE is responsible for countless
hours of head-scratching by poor developers trying to use magical
memory debuggers - it's at best unusual for code to read up memory
without caring a bit whether the memory has ever been stored to.

I should note that Py_ADDRESS_IN_RANGE is my code - this isn't a
backhanded swipe at someone else.  It's always been near the top of
the "code stink" scale.  So I thought I'd mention it in case someone
has been sitting on a cleaner idea but has held back because they
didn't want to offend me ;-)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE

2017-06-02 Thread Antoine Pitrou
On Fri, 2 Jun 2017 13:23:05 -0500
Tim Peters  wrote:
> 
> While I would like to increase the pool size, it's fraught with
> danger.

What would be the point of increasing the pool size?  Apart from being
able to allocate 4KB objects out of it, I mean.

Since 4KB+ objects are relatively uncommon (I mean we don't allocate
hundreds of thousands of them per second), I don't think it's really
worthwhile trying to have the small object allocator handle them.

> It would be nice to find a different way for pymalloc to figure out
> which addresses belong to it.  The excruciating Py_ADDRESS_IN_RANGE
> manages to do it in small constant (independent of the number of
> arenas and pools in use) time, which is its only virtue ;-)

So, to sum it up, it's excruciating but fast and works reliably.  Why
change it?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE

2017-06-02 Thread Tim Peters
[INADA Naoki ]
> ...
> Since current pool size is 4KB and there is pool_header in pool,
> we can't allocate 4KB block from pool.
> And if support 1KB block, only 3KB of 4KB can be actually used.
> I think 512 bytes / 4KB (1/8) is good ratio.
>
> Do you mean increase pool size?
>
> How about adding configure option like server-mode?
>
> SMALL_REQUEST_THRESHOLD  1024   // 2x
> POOL_SIZE  (16*1024)// 4x
> ARENA_SIZE  (2*1024*1024)   // 8x, and same to huge page size.

While I would like to increase the pool size, it's fraught with
danger.  The problem:  Py_ADDRESS_IN_RANGE has to figure out whether
an "arbitrary" address is - or is not - controlled by Python's
small-object allocator.  The excruciating logic is spelled out at
length in obmalloc.c.

As is, the code reads up memory near "the start" of a pool to get the
pool's belief about which arena it's in.  If the memory is not in fact
controlled by pymalloc, this can be anything, even uninitialized
trash.  That's fine (as explained in the comments) - but that memory
_must_ be readable.

And that's why POOL_SIZE is set to 4K:  it's the smallest page size
across all the OSes Python is known to run on.  If pymalloc is handed
an "arbitrary" (but valid) address, the entire OS page containing that
address is readable.

If, e.g., an OS allocates 4K pages, but Python's POOL_SIZE is 8K
(anything bigger than the OS's page allocation unit), then perhaps the
OS page at 16K is unallocated, but the page at 20K is.  pymalloc sees
an address at 21K.  As is, pymalloc reads the putative arena index at
20K, which is fine.  But if POOL_SIZE were 8K, it would try to read
the putative arena index at 16K, and - boom! - segfault.

This failure mode would be rare but - of course - catastrophic when it occurs.

It would be nice to find a different way for pymalloc to figure out
which addresses belong to it.  The excruciating Py_ADDRESS_IN_RANGE
manages to do it in small constant (independent of the number of
arenas and pools in use) time, which is its only virtue ;-)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 484 update proposal: annotating decorated declarations

2017-06-02 Thread Guido van Rossum
On Fri, Jun 2, 2017 at 9:41 AM, Koos Zevenhoven  wrote:

> On Fri, Jun 2, 2017 at 6:34 PM, Naomi Seyfer  wrote:
> > Yep, interested in implementing it!  I will put implementation time on my
> > schedule and tell y'all when it is, for holding myself accountable -- it
> > turns out I never do anything not on my schedule.
>
> I still don't understand what would happen with __annotations__. If
> the decorator returns a non-function, one would expect the annotations
> to be in the __annotations__ attribute of the enclosing class or
> module. If it returns a function, they would be in the __annotations__
> attribute of the function. And I'm talking about the runtime behavior
> in Python as explained in PEP484 and PEP526. I would expect these
> declarations to behave according to the same principles as other ways
> to annotate variables/functions. If there is no runtime behavior, a
> comment-based syntax might be more appropriate. Or have I missed
> something?
>

So when returning a function, the runtime version of the decorator can
easily update the function's __annotations__. But when returning a
non-function, the decorator would have a hard time updating __annotations__
of the containing class/module without "cheating" (e.g. sys._getframe()). I
think the latter is similar to e.g. attributes defined with @property --
those don't end up in __annotations__ either. I think this is an acceptable
deficiency.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Nathaniel Smith
On Jun 2, 2017 7:24 AM, "Ben Darnell"  wrote:

The PEP's rationale is now "This PEP will help facilitate the future
adoption
of :pep:`543` across all supported Python versions, which will improve
security
for both Python 2 and Python 3 users."

What exactly are these security improvements? My understanding (which may
well be incorrect) is that the security improvements come in the future
when the PEP 543 APIs are implemented on top of the various platform-native
security libraries. These changes will not be present until some future 3.x
release, and will not be backported to 2.7 (without another PEP, which I
expect would be even more contentious than this one). What then is the
security improvement for Python 2 users?


My understanding is that PEP 543 would be released as a library on pypi, as
well as targeting stdlib inclusion in some future release. The goal of the
MemoryBIO PEP is to allow the PEP 543 package to be pure-python, support
all the major platforms, and straddle py2/py3


In Tornado, I have not felt any urgency to replace wrap_socket with
MemoryBIO. Is there a security-related reason I should do so sooner rather
than later? (I'd also urge Cory and any other wrap_socket skeptics on the
requests team to reconsider - Tornado's SSLIOStream works well. The
asynchronous use of wrap_socket isn't so subtle and risky with buffering)


I'll leave the discussion of wrap_socket's reliability to others, because I
don't have any experience there, but I do want to point out that that which
primitive you pick has major system design consequences. MemoryBIO is an
abstraction that can implement wrap_socket, but not vice-versa; if you use
wrap_socket as your fundamental primitive then that leaks all over your
abstraction hierarchy.

Twisted has to have a MemoryBIO​ implementation of their TLS code, because
they want to support TLS over arbitrary transports. Carrying a wrap_socket
implementation as well would mean twice the tricky and security-sensitive
code to maintain, plus breaking their current abstractions to expose
whether any particular transport object is actually just a raw socket.

The problem gets worse when you add PEP 543's pluggable TLS backends. If
you can require a MemoryBIO-like API, then adding a backend becomes a
matter of defining like 4 methods with relatively simple, testable
interfaces, and it automatically works everywhere. Implementing a
wrap_socket interface on top of this is complicated because the socket API
is complicated and full of subtle corners, but it only has to be done once
and libraries like tornado can adapt to the quirks of the one
implementation.

OTOH if you have to also support backends that only have wrap_socket, then
this multiplies the complexity of everything. Now we need a way to
negotiate which APIs each backend supports, and we need to somehow document
all the weird corners of how wrapped sockets are supposed to behave in edge
cases, and when different wrapped sockets inevitably behave differently
then libraries like tornado need to discover those differences and support
all the options. We need a way to explain to users why some backends work
with some libraries but not others. And so on. And we still need to support
MemoryBIOs in as many cases as possible.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Donald Stufft

> On Jun 2, 2017, at 12:41 PM, Antoine Pitrou  wrote:
> 
> On Fri, 2 Jun 2017 12:22:06 -0400
> Donald Stufft  wrote:
>> 
>> It’s not just bootstrapping that pip has a problem with for C extensions, it 
>> also prevents upgrading PyOpenSSL on Windows because having pip import 
>> PyOpenSSL locks the .dll, and we can’t delete it or overwrite it until the 
>> pip process exits and no longer imports PyOpenSSL. This isn’t a problem on 
>> Linux or macOS or the other *nix clients though. We patch requests as it is 
>> today to prevent it from importing simplejson and cryptography for this 
>> reason.
> 
> Does pip use any advanced features in Requests, at least when it comes
> to downloading packages (which is where the bootstrapping issue lies
> AFAIU)? Because at this point it should like you may be better off with
> a simple pure Python HTTP downloader.
> 

It’s hard to fully answer the question because it sort of depends?

Could we switch to just like, urllib2 or something? Yea we could, in fact we 
used to use that and switched to using requests because we had to backport 
security work around / fixes ourselves (the big one at the time was host name 
matching/verification) and we were really bad at keeping up with tracking what 
patches needed applying and when. Switching to requests let us offload that 
work to the requests team who are doing a phenomenal job at it.

Beyond that though, getting HTTP right is hard, and pip used to have to try and 
implement work arounds to either broken or less optimal urllib2 behavior 
whereas requests generally gets it right for us out of the box.

Closer to your specific questions about features, we’re using the requests 
session support to handle connection pooling to speed up downloading (since we 
don’t need to open a new connection for every download then), the adapter API 
to handle transparently allowing file:// URLs, the auth framework to handle 
holding authentication for multiple domains at once, and the third party 
library cachecontrol to handle our HTTP caching using a browser style cache.

I suspect (though I’d let him speak for himself) that Cory would rather 
continue to be sync only than require pip to go back to not using requests.


—
Donald Stufft



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread David Wilson
On Sat, Jun 03, 2017 at 02:10:50AM +1000, Nick Coghlan wrote:

> * and figure out some other pip-specific option for ensurepip
> bootstrapping (like a *private* MemoryBIO implementation, or falling
> back to synchronous mode in requests)

Ignoring Ben's assertion regarding the legitimacy of async
wrap_socket() (which seems to render this entire conversation moot), if
you still really want to go this route, could ctypes be abused to
provide the missing implementation from the underlying libs? It'd be a
hack, but it would only be necessary during bootstrapping.


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Donald Stufft

> On Jun 2, 2017, at 12:39 PM, Nick Coghlan  wrote:
> 
> On 3 June 2017 at 02:22, Donald Stufft  wrote:
>> It’s not just bootstrapping that pip has a problem with for C extensions, it
>> also prevents upgrading PyOpenSSL on Windows because having pip import
>> PyOpenSSL locks the .dll, and we can’t delete it or overwrite it until the
>> pip process exits and no longer imports PyOpenSSL. This isn’t a problem on
>> Linux or macOS or the other *nix clients though. We patch requests as it is
>> today to prevent it from importing simplejson and cryptography for this
>> reason.
> 
> Would requests be loading PyOpenSSL on Windows, though? If the aim is
> to facilitate PEP 543, then I'd expect it to be using the SChannel
> backend in that case.
> 


I’m not sure! The exact specifics of how it’d get implemented and the 
transition from what we have now to that could be yes or no (particularly the 
transition period). I’m just making sure that the constraint we have in pip is 
clearly defined here to make sure we don’t accept something that ends up not 
actually being suitable. I don’t have an opinion on the private bootstrap 
module (well I do, I like it less than just back porting MemoryBio “for real", 
but not one on whether it’d work or not).


—
Donald Stufft



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Antoine Pitrou
On Fri, 2 Jun 2017 12:22:06 -0400
Donald Stufft  wrote:
> 
> It’s not just bootstrapping that pip has a problem with for C extensions, it 
> also prevents upgrading PyOpenSSL on Windows because having pip import 
> PyOpenSSL locks the .dll, and we can’t delete it or overwrite it until the 
> pip process exits and no longer imports PyOpenSSL. This isn’t a problem on 
> Linux or macOS or the other *nix clients though. We patch requests as it is 
> today to prevent it from importing simplejson and cryptography for this 
> reason.

Does pip use any advanced features in Requests, at least when it comes
to downloading packages (which is where the bootstrapping issue lies
AFAIU)? Because at this point it should like you may be better off with
a simple pure Python HTTP downloader.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 484 update proposal: annotating decorated declarations

2017-06-02 Thread Koos Zevenhoven
On Fri, Jun 2, 2017 at 6:34 PM, Naomi Seyfer  wrote:
> Yep, interested in implementing it!  I will put implementation time on my
> schedule and tell y'all when it is, for holding myself accountable -- it
> turns out I never do anything not on my schedule.
>

I still don't understand what would happen with __annotations__. If
the decorator returns a non-function, one would expect the annotations
to be in the __annotations__ attribute of the enclosing class or
module. If it returns a function, they would be in the __annotations__
attribute of the function. And I'm talking about the runtime behavior
in Python as explained in PEP484 and PEP526. I would expect these
declarations to behave according to the same principles as other ways
to annotate variables/functions. If there is no runtime behavior, a
comment-based syntax might be more appropriate. Or have I missed
something?


—Koos



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Nick Coghlan
On 3 June 2017 at 02:22, Donald Stufft  wrote:
> It’s not just bootstrapping that pip has a problem with for C extensions, it
> also prevents upgrading PyOpenSSL on Windows because having pip import
> PyOpenSSL locks the .dll, and we can’t delete it or overwrite it until the
> pip process exits and no longer imports PyOpenSSL. This isn’t a problem on
> Linux or macOS or the other *nix clients though. We patch requests as it is
> today to prevent it from importing simplejson and cryptography for this
> reason.

Would requests be loading PyOpenSSL on Windows, though? If the aim is
to facilitate PEP 543, then I'd expect it to be using the SChannel
backend in that case.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Donald Stufft

> On Jun 2, 2017, at 12:10 PM, Nick Coghlan  wrote:
> 
> On 2 June 2017 at 19:42, Victor Stinner  wrote:
>> Thanks Cory for the long explanation. Let me try to summarize (tell me
>> if I'm wrong).
>> 
>> We have 3 options:
>> 
>> * Do nothing: reject the PEP 546 and let each project handles security
>> on its own (current status co)
>> * Write *new* C code, maybe using certitude as a starting point, to
>> offload certifcate validation on Windows and macOS
>> * Backport existing code from master to 2.7: MemoryBIO and SSLObject
> 
> There's also a 4th option:
> 
> * Introduce a dependency from requests onto PyOpenSSL when running in
> async mode on Python 2.7 in the general case, and figure out some
> other pip-specific option for ensurepip bootstrapping (like a
> *private* MemoryBIO implementation, or falling back to synchronous
> mode in requests)
> 
> During the pre-publication PEP discussions, I kinda dismissed the
> PyOpenSSL dependency option out of hand due to the ensurepip
> bootstrapping issues it may introduce, but I think we need to discuss
> it further in the PEP as it would avoid some of the other challenges
> brought up here (Linux distro update latencies, potential
> complications for alternate Python 2.7 implementations, etc).
> 

It’s not just bootstrapping that pip has a problem with for C extensions, it 
also prevents upgrading PyOpenSSL on Windows because having pip import 
PyOpenSSL locks the .dll, and we can’t delete it or overwrite it until the pip 
process exits and no longer imports PyOpenSSL. This isn’t a problem on Linux or 
macOS or the other *nix clients though. We patch requests as it is today to 
prevent it from importing simplejson and cryptography for this reason.


—
Donald Stufft



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Barry Warsaw
On Jun 03, 2017, at 02:10 AM, Nick Coghlan wrote:

>* Introduce a dependency from requests onto PyOpenSSL when running in
>async mode on Python 2.7 in the general case, and figure out some
>other pip-specific option for ensurepip bootstrapping (like a
>*private* MemoryBIO implementation, or falling back to synchronous
>mode in requests)
[...]
>
>If we adopted the latter approach, then for almost all intents and
>purposes, ssl.MemoryBIO and ssl.SSLObject would remain a Python 3.5+
>only API, and anyone wanting access to it on 2.7 would still need to
>depend on PyOpenSSL.
>
>The benefit of making any backport a private API is that it would mean
>we weren't committing to support that API for general use: it would be
>supported *solely* for the use case discussed in the PEP (i.e. helping
>to advance the development of PEP 543 without breaking pip
>bootstrapping in the process).

That sounds like a good compromise.  My own major objection was in exposing a
new public API in Python 2.7, which would clearly be a new feature.

Cheers,
-Barry
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Nick Coghlan
On 2 June 2017 at 19:42, Victor Stinner  wrote:
> Thanks Cory for the long explanation. Let me try to summarize (tell me
> if I'm wrong).
>
> We have 3 options:
>
> * Do nothing: reject the PEP 546 and let each project handles security
> on its own (current status co)
> * Write *new* C code, maybe using certitude as a starting point, to
> offload certifcate validation on Windows and macOS
> * Backport existing code from master to 2.7: MemoryBIO and SSLObject

There's also a 4th option:

* Introduce a dependency from requests onto PyOpenSSL when running in
async mode on Python 2.7 in the general case, and figure out some
other pip-specific option for ensurepip bootstrapping (like a
*private* MemoryBIO implementation, or falling back to synchronous
mode in requests)

During the pre-publication PEP discussions, I kinda dismissed the
PyOpenSSL dependency option out of hand due to the ensurepip
bootstrapping issues it may introduce, but I think we need to discuss
it further in the PEP as it would avoid some of the other challenges
brought up here (Linux distro update latencies, potential
complications for alternate Python 2.7 implementations, etc).

For example:

* if requests retains a synchronous mode fallback implementation, then
ensurepip could use that in the absence of PyOpenSSL
* even if requests drops synchronous mode entirely, we could leave the
public ssl module API alone, and add an _ensurepip bootstrap module
specifically for use in the absence of a full PyOpenSSL module

If we adopted the latter approach, then for almost all intents and
purposes, ssl.MemoryBIO and ssl.SSLObject would remain a Python 3.5+
only API, and anyone wanting access to it on 2.7 would still need to
depend on PyOpenSSL.

The benefit of making any backport a private API is that it would mean
we weren't committing to support that API for general use: it would be
supported *solely* for the use case discussed in the PEP (i.e. helping
to advance the development of PEP 543 without breaking pip
bootstrapping in the process).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2017-06-02 Thread Python tracker

ACTIVITY SUMMARY (2017-05-26 - 2017-06-02)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open6025 (+26)
  closed 36284 (+43)
  total  42309 (+69)

Open issues with patches: 2370 


Issues opened (54)
==

#30486: Allow setting cell value
http://bugs.python.org/issue30486  opened by pitrou

#30487: DOC: automatically create a venv and install Sphinx when runni
http://bugs.python.org/issue30487  opened by cjrh

#30489: Add CmdLineTest to standard library
http://bugs.python.org/issue30489  opened by Santiago Castro

#30490: Allow pass an exception to the Event.set method
http://bugs.python.org/issue30490  opened by pfreixes

#30491: Add a lightweight mechanism for detecting un-awaited coroutine
http://bugs.python.org/issue30491  opened by njs

#30492: 'make clinic' does not work for out of tree builds / clinic.py
http://bugs.python.org/issue30492  opened by gregory.p.smith

#30493: Increase coverage of base64
http://bugs.python.org/issue30493  opened by leecannon

#30495: IDLE: modernize textview module
http://bugs.python.org/issue30495  opened by terry.reedy

#30498: Run Python's slowest tests in the first 3/4 of tests when usin
http://bugs.python.org/issue30498  opened by brett.cannon

#30500: urllib connects to a wrong host
http://bugs.python.org/issue30500  opened by Nam.Nguyen

#30501: Produce optimized code for boolean conditions
http://bugs.python.org/issue30501  opened by serhiy.storchaka

#30502: Fix buffer handling of OBJ_obj2txt
http://bugs.python.org/issue30502  opened by christian.heimes

#30504: Allow inspecting buffering attribute of IO objects
http://bugs.python.org/issue30504  opened by pitrou

#30505: Performance of typing._ProtocolMeta._get_protocol_attrs and is
http://bugs.python.org/issue30505  opened by orenbenkiki

#30508: "Task exception was never retrieved" reported for a canceled t
http://bugs.python.org/issue30508  opened by Miguel Grinberg

#30509: Optimize calling type slots
http://bugs.python.org/issue30509  opened by serhiy.storchaka

#30510: c_bool type not supported for BigEndianStructure on little-end
http://bugs.python.org/issue30510  opened by Hassan El Karouni

#30511: shutil.make_archive should not need to chdir (alternatively: m
http://bugs.python.org/issue30511  opened by Alex Gaynor

#30512: CAN Socket support for NetBSD
http://bugs.python.org/issue30512  opened by wiz

#30513: getrusage returns platform-dependent value
http://bugs.python.org/issue30513  opened by sam-s

#30514: test_poplib replace asyncore
http://bugs.python.org/issue30514  opened by grzgrzgrz3

#30516: Documentation for datetime substract operation incorrect?
http://bugs.python.org/issue30516  opened by René Hernández Remedios

#30518: Import type aliases from another module
http://bugs.python.org/issue30518  opened by Paragape

#30519: Add daemon argument to Timer
http://bugs.python.org/issue30519  opened by awolokita

#30520: loggers can't be pickled
http://bugs.python.org/issue30520  opened by pitrou

#30521: IDLE: Add navigate bar and replace current goto dialog
http://bugs.python.org/issue30521  opened by louielu

#30522: Allow replacing a logging.StreamHandler's stream
http://bugs.python.org/issue30522  opened by pitrou

#30523: unittest: add --list-tests option to only display the list of 
http://bugs.python.org/issue30523  opened by haypo

#30524: iter(classmethod, sentinel) broken for Argument Clinic class m
http://bugs.python.org/issue30524  opened by mjpieters

#30525: Expose SCTs on TLS connections
http://bugs.python.org/issue30525  opened by alex

#30526: Allow setting line_buffering on existing TextIOWrapper
http://bugs.python.org/issue30526  opened by pitrou

#30528: ipaddress.IPv{4,6}Network.reverse_pointer is broken
http://bugs.python.org/issue30528  opened by h.venev

#30529: Incorrect error messages for invalid whitespaces in f-string s
http://bugs.python.org/issue30529  opened by serhiy.storchaka

#30530: Descriptors HowTo: Example on function.__get__ needs update
http://bugs.python.org/issue30530  opened by Mariano Anaya

#30532: email.policy.SMTP.fold() mangles long headers
http://bugs.python.org/issue30532  opened by chr...@emergence.com

#30533: missing feature in inspect module: getmembers_static
http://bugs.python.org/issue30533  opened by carljm

#30534: error message for incorrect call degraded in 3.7
http://bugs.python.org/issue30534  opened by scoder

#30535: Warn that meta_path is not empty
http://bugs.python.org/issue30535  opened by xmorel

#30536: [EASY] SubinterpThreadingTests.test_threads_join_2() of test_t
http://bugs.python.org/issue30536  opened by haypo

#30537: Using PyNumber_AsSsize_t in itertools.islice
http://bugs.python.org/issue30537  opened by MSeifert

#30538: Functional Programming HOWTO describes one argument itertools.
http://bugs.python.org/issue30538  opened by csabella

#30539: Make Proactor 

Re: [Python-Dev] PEP 484 update proposal: annotating decorated declarations

2017-06-02 Thread Naomi Seyfer
Yep, interested in implementing it!  I will put implementation time on my
schedule and tell y'all when it is, for holding myself accountable -- it
turns out I never do anything not on my schedule.

On Wed, May 31, 2017 at 3:17 PM, Guido van Rossum  wrote:

> On Wed, May 31, 2017 at 6:16 AM, Ivan Levkivskyi 
> wrote:
>
>> On 30 May 2017 at 23:02, Guido van Rossum  wrote:
>>
>>> All in all I'm still leaning towards Naomi's original proposal -- it
>>> looks simpler to implement as well.
>>>
>>
>> OK, I think having a bit of verbosity is absolutely fine if we win
>> simplicity of implementation (for both static and runtime purposes).
>>
>
> Then I propose to do it this way. We can always add Jukka's way as an
> alternative notation later. I'd like to hear from Jukka before I merge the
> PR for PEP-484.
>
> In the meantime, Naomi, are you interested in trying to implement this?
>
> --
> --Guido van Rossum (python.org/~guido)
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Ben Darnell
The PEP's rationale is now "This PEP will help facilitate the future
adoption
of :pep:`543` across all supported Python versions, which will improve
security
for both Python 2 and Python 3 users."

What exactly are these security improvements? My understanding (which may
well be incorrect) is that the security improvements come in the future
when the PEP 543 APIs are implemented on top of the various platform-native
security libraries. These changes will not be present until some future 3.x
release, and will not be backported to 2.7 (without another PEP, which I
expect would be even more contentious than this one). What then is the
security improvement for Python 2 users?

In Tornado, I have not felt any urgency to replace wrap_socket with
MemoryBIO. Is there a security-related reason I should do so sooner rather
than later? (I'd also urge Cory and any other wrap_socket skeptics on the
requests team to reconsider - Tornado's SSLIOStream works well. The
asynchronous use of wrap_socket isn't so subtle and risky with buffering)

-Ben

On Fri, Jun 2, 2017 at 8:07 AM Antoine Pitrou  wrote:

>
> Thanks you for all the explanations.  So to summarize my opinion, I'm
> still -0.5 on this PEP.  I would also like to see the issues Jython,
> Ubuntu et al. have mentioned solved before this is accepted.
>
> Regards
>
> Antoine.
>
>
>
> On Fri, 2 Jun 2017 11:42:58 +0200
> Victor Stinner  wrote:
> > Thanks Cory for the long explanation. Let me try to summarize (tell me
> > if I'm wrong).
> >
> > We have 3 options:
> >
> > * Do nothing: reject the PEP 546 and let each project handles security
> > on its own (current status co)
> > * Write *new* C code, maybe using certitude as a starting point, to
> > offload certifcate validation on Windows and macOS
> > * Backport existing code from master to 2.7: MemoryBIO and SSLObject
> >
> > Writing new code seems more risky and error-prone than backporting
> > already "battle-tested" MemoryBIO from master. I also expect that
> > writing code to validate certificate will be longer than the "100
> > lines of C code in (probably)" expected by Steve Dower.
> >
> > rust-certitude counts around 700 lines of Rust and 80 lines of Python
> > code. But maybe I misunderstood the purpose of certitude: Steve Dower
> > asked to only validate a certificate, not load or export CA.
> >
> > I counted 150 Python lines for SSLObject and 230 C lines for MemoryBIO.
> >
> > Since the long term plan is to not use stdlib ssl but a new
> > implementation on Windows and macOS, it seems worthless to backport
> > MemoryBIO on Python 2.7. The PEP 546 (backport MemoryBIO) is a
> > practical solution to provide a *smooth* transition from ssl to a new
> > TLS API. The experience showed that hard changes like "run 2to3 and
> > drop your Python 2 code" doesn't work in practice. Users want a
> > transition plan with small steps.
> >
> > Victor
> >
> > 2017-06-02 11:08 GMT+02:00 Cory Benfield :
> > >
> > > On 1 Jun 2017, at 20:59, Steve Dower  wrote:
> > >
> > > On 01Jun2017 1010, Nathaniel Smith wrote:
> > >
> > > I believe that for answering this question about the ssl module, it's
> really
> > > only Linux users that matter, since pip/requests/everyone else pushing
> for
> > > this only want to use ssl.MemoryBIO on Linux. Their plan on
> Windows/MacOS
> > > (IIUC) is to stop using the ssl module entirely in favor of new ctypes
> > > bindings for their respective native TLS libraries.
> > > (And yes, in principle it might be possible to write new ctypes-based
> > > bindings for openssl, but (a) this whole project is already teetering
> on the
> > > verge of being impossible given the resources available, so adding any
> major
> > > extra deliverable is likely to sink the whole thing, and (b) compared
> to the
> > > proprietary libraries, openssl is *much* harder and riskier to wrap at
> the
> > > ctypes level because it has different/incompatible ABIs depending on
> its
> > > micro version and the vendor who distributed it. This is why manylinux
> > > packages that need openssl have to ship their own, but pip can't and
> > > shouldn't ship its own openssl for many hopefully obvious reasons.)
> > >
> > >
> > > How much of a stop-gap would it be (for Windows at least) to override
> > > OpenSSL's certificate validation with a call into the OS? This leaves
> most
> > > of the work with OpenSSL, but lets the OS say yes/no to the
> certificates
> > > based on its own configuration.
> > >
> > > For Windows, this is under 100 lines of C code in (probably) _ssl, and
> while
> > > I think an SChannel based approach is the better way to go
> long-term,[1]
> > > offering platform-specific certificate validation as the default in
> 2.7 is
> > > far more palatable than backporting new public API.
> > >
> > >
> > > It’s entirely do-able. This is where I reveal just how long I’ve been
> > > fretting over this problem: 

Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Antoine Pitrou

Thanks you for all the explanations.  So to summarize my opinion, I'm
still -0.5 on this PEP.  I would also like to see the issues Jython,
Ubuntu et al. have mentioned solved before this is accepted.

Regards

Antoine.



On Fri, 2 Jun 2017 11:42:58 +0200
Victor Stinner  wrote:
> Thanks Cory for the long explanation. Let me try to summarize (tell me
> if I'm wrong).
> 
> We have 3 options:
> 
> * Do nothing: reject the PEP 546 and let each project handles security
> on its own (current status co)
> * Write *new* C code, maybe using certitude as a starting point, to
> offload certifcate validation on Windows and macOS
> * Backport existing code from master to 2.7: MemoryBIO and SSLObject
> 
> Writing new code seems more risky and error-prone than backporting
> already "battle-tested" MemoryBIO from master. I also expect that
> writing code to validate certificate will be longer than the "100
> lines of C code in (probably)" expected by Steve Dower.
> 
> rust-certitude counts around 700 lines of Rust and 80 lines of Python
> code. But maybe I misunderstood the purpose of certitude: Steve Dower
> asked to only validate a certificate, not load or export CA.
> 
> I counted 150 Python lines for SSLObject and 230 C lines for MemoryBIO.
> 
> Since the long term plan is to not use stdlib ssl but a new
> implementation on Windows and macOS, it seems worthless to backport
> MemoryBIO on Python 2.7. The PEP 546 (backport MemoryBIO) is a
> practical solution to provide a *smooth* transition from ssl to a new
> TLS API. The experience showed that hard changes like "run 2to3 and
> drop your Python 2 code" doesn't work in practice. Users want a
> transition plan with small steps.
> 
> Victor
> 
> 2017-06-02 11:08 GMT+02:00 Cory Benfield :
> >
> > On 1 Jun 2017, at 20:59, Steve Dower  wrote:
> >
> > On 01Jun2017 1010, Nathaniel Smith wrote:
> >
> > I believe that for answering this question about the ssl module, it's really
> > only Linux users that matter, since pip/requests/everyone else pushing for
> > this only want to use ssl.MemoryBIO on Linux. Their plan on Windows/MacOS
> > (IIUC) is to stop using the ssl module entirely in favor of new ctypes
> > bindings for their respective native TLS libraries.
> > (And yes, in principle it might be possible to write new ctypes-based
> > bindings for openssl, but (a) this whole project is already teetering on the
> > verge of being impossible given the resources available, so adding any major
> > extra deliverable is likely to sink the whole thing, and (b) compared to the
> > proprietary libraries, openssl is *much* harder and riskier to wrap at the
> > ctypes level because it has different/incompatible ABIs depending on its
> > micro version and the vendor who distributed it. This is why manylinux
> > packages that need openssl have to ship their own, but pip can't and
> > shouldn't ship its own openssl for many hopefully obvious reasons.)
> >
> >
> > How much of a stop-gap would it be (for Windows at least) to override
> > OpenSSL's certificate validation with a call into the OS? This leaves most
> > of the work with OpenSSL, but lets the OS say yes/no to the certificates
> > based on its own configuration.
> >
> > For Windows, this is under 100 lines of C code in (probably) _ssl, and while
> > I think an SChannel based approach is the better way to go long-term,[1]
> > offering platform-specific certificate validation as the default in 2.7 is
> > far more palatable than backporting new public API.
> >
> >
> > It’s entirely do-able. This is where I reveal just how long I’ve been
> > fretting over this problem: https://pypi.python.org/pypi/certitude. Ignore
> > the description, it’s wildly out-of-date: let me summarise the library
> > instead.
> >
> > Certitude is a Python library that uses CFFI and Rust to call into the
> > system certificate validation libraries on macOS and Windows using a single
> > unified API. Under the covers it has a whole bunch of Rust code that
> > translates from what OpenSSL can give you (a list of certificates in the
> > peer cert chain in DER format) and into what those two operating systems
> > expect. The Rust code for Windows is here[1] and is about as horrifying a
> > chunk of Rust as you can imagine seeing (the Windows API does not translate
> > very neatly into Rust so the word “unsafe” appears a lot), but it does
> > appear to work, at least in the mainline cases and in the few tests I have.
> > The macOS code is here[2] and is moderately less horrifying, containing no
> > instances of the word “unsafe”.
> >
> > I lifted this approach from Chrome, because at the moment this is what they
> > do: they use their custom fork of OpenSSL (BoringSSL) to do the actual TLS
> > protocol manipulation, but hand the cert chain verification off to
> > platform-native libraries on Windows and macOS.
> >
> > I have never productised this library because ultimately I never had the
> > 

Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE

2017-06-02 Thread INADA Naoki
> I would be curious of another test: use pymalloc for objects larger
> than 512 bytes. For example, allocate up to 4 KB?

Since current pool size is 4KB and there is pool_header in pool,
we can't allocate 4KB block from pool.
And if support 1KB block, only 3KB of 4KB can be actually used.
I think 512 bytes / 4KB (1/8) is good ratio.

Do you mean increase pool size?

How about adding configure option like server-mode?

SMALL_REQUEST_THRESHOLD  1024   // 2x
POOL_SIZE  (16*1024)// 4x
ARENA_SIZE  (2*1024*1024)   // 8x, and same to huge page size.

>
> In the past, we already changed the maximum size from 256 to 512 to
> support most common Python objects on 64-bit platforms. Since Python
> objects contain many pointers: switching from 32 bit to 64 bit can
> double the size of the object in the worst case.
>

Make sense.


Naoki
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Cory Benfield

> On 2 Jun 2017, at 10:42, Victor Stinner  wrote:
> 
> Writing new code seems more risky and error-prone than backporting
> already "battle-tested" MemoryBIO from master. I also expect that
> writing code to validate certificate will be longer than the "100
> lines of C code in (probably)" expected by Steve Dower.
> 
> rust-certitude counts around 700 lines of Rust and 80 lines of Python
> code. But maybe I misunderstood the purpose of certitude: Steve Dower
> asked to only validate a certificate, not load or export CA.

That’s all certitude does. The docs of certitude are from an older version of 
the project when I was considering just doing a live-export to PEM file, before 
I realised the security concerns of that approach.

We’d also require some other code that lives outside certitude. In particular, 
code needs to be written to handle the OpenSSL verify callback to save off the 
cert chain and to translate errors appropriately. There’s also a follow-on 
problem: the ssl module allows you to call SSLContext.load_default_certs and 
then SSLContext.load_verify_locations. If you do that, those two behave 
*additively*: both the default certs and custom verify locations are trusted. 
Certitude doesn’t allow you to do that: it says that if you choose to use the 
system certs then darn it that’s all you get. Working out how to do that 
without just importing random stuff into the user’s keychain would be…tricky. 
Do-able, for sure, but would require code I haven’t written for Certitude (I 
may have written it using ctypes elsewhere though).

Cory
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE

2017-06-02 Thread Victor Stinner
I would be curious of another test: use pymalloc for objects larger
than 512 bytes. For example, allocate up to 4 KB?

In the past, we already changed the maximum size from 256 to 512 to
support most common Python objects on 64-bit platforms. Since Python
objects contain many pointers: switching from 32 bit to 64 bit can
double the size of the object in the worst case.

Victor

2017-06-01 9:38 GMT+02:00 Larry Hastings :
>
>
> When CPython's small block allocator was first merged in late February 2001,
> it allocated memory in gigantic chunks it called "arenas".  These arenas
> were a massive 256 KILOBYTES apiece.
>
> This tunable parameter has not been touched in the intervening 16 years.
> Yet CPython's memory consumption continues to grow.  By the time a current
> "trunk" build of CPython reaches the REPL prompt it's already allocated 16
> arenas.
>
> I propose we make the arena size larger.  By how much?  I asked Victor to
> run some benchmarks with arenas of 1mb, 2mb, and 4mb.  The results with 1mb
> and 2mb were mixed, but his benchmarks with a 4mb arena size showed
> measurable (>5%) speedups on ten benchmarks and no slowdowns.
>
> What would be the result of making the arena size 4mb?
>
> CPython could no longer run on a computer where at startup it could not
> allocate at least one 4mb continguous block of memory.
> CPython programs would die slightly sooner in out-of-memory conditions.
> CPython programs would use more memory.  How much?  Hard to say.  It depends
> on their allocation strategy.  I suspect most of the time it would be < 3mb
> additional memory.  But for pathological allocation strategies the
> difference could be significant.  (e.g: lots of allocs, followed by lots of
> frees, but the occasional object lives forever, which means that the arena
> it's in can never be freed.  If 1 out of ever 16 256k arenas is kept alive
> this way, and the objects are spaced out precisely such that now it's 1 for
> every 4mb arena, max memory use would be the same but later stable memory
> use would hypothetically be 16x current)
> Many programs would be slightly faster now and then, simply because we call
> malloc() 1/16 as often.
>
>
> What say you?  Vote for your favorite color of bikeshed now!
>
>
> /arry
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Victor Stinner
Thanks Cory for the long explanation. Let me try to summarize (tell me
if I'm wrong).

We have 3 options:

* Do nothing: reject the PEP 546 and let each project handles security
on its own (current status co)
* Write *new* C code, maybe using certitude as a starting point, to
offload certifcate validation on Windows and macOS
* Backport existing code from master to 2.7: MemoryBIO and SSLObject

Writing new code seems more risky and error-prone than backporting
already "battle-tested" MemoryBIO from master. I also expect that
writing code to validate certificate will be longer than the "100
lines of C code in (probably)" expected by Steve Dower.

rust-certitude counts around 700 lines of Rust and 80 lines of Python
code. But maybe I misunderstood the purpose of certitude: Steve Dower
asked to only validate a certificate, not load or export CA.

I counted 150 Python lines for SSLObject and 230 C lines for MemoryBIO.

Since the long term plan is to not use stdlib ssl but a new
implementation on Windows and macOS, it seems worthless to backport
MemoryBIO on Python 2.7. The PEP 546 (backport MemoryBIO) is a
practical solution to provide a *smooth* transition from ssl to a new
TLS API. The experience showed that hard changes like "run 2to3 and
drop your Python 2 code" doesn't work in practice. Users want a
transition plan with small steps.

Victor

2017-06-02 11:08 GMT+02:00 Cory Benfield :
>
> On 1 Jun 2017, at 20:59, Steve Dower  wrote:
>
> On 01Jun2017 1010, Nathaniel Smith wrote:
>
> I believe that for answering this question about the ssl module, it's really
> only Linux users that matter, since pip/requests/everyone else pushing for
> this only want to use ssl.MemoryBIO on Linux. Their plan on Windows/MacOS
> (IIUC) is to stop using the ssl module entirely in favor of new ctypes
> bindings for their respective native TLS libraries.
> (And yes, in principle it might be possible to write new ctypes-based
> bindings for openssl, but (a) this whole project is already teetering on the
> verge of being impossible given the resources available, so adding any major
> extra deliverable is likely to sink the whole thing, and (b) compared to the
> proprietary libraries, openssl is *much* harder and riskier to wrap at the
> ctypes level because it has different/incompatible ABIs depending on its
> micro version and the vendor who distributed it. This is why manylinux
> packages that need openssl have to ship their own, but pip can't and
> shouldn't ship its own openssl for many hopefully obvious reasons.)
>
>
> How much of a stop-gap would it be (for Windows at least) to override
> OpenSSL's certificate validation with a call into the OS? This leaves most
> of the work with OpenSSL, but lets the OS say yes/no to the certificates
> based on its own configuration.
>
> For Windows, this is under 100 lines of C code in (probably) _ssl, and while
> I think an SChannel based approach is the better way to go long-term,[1]
> offering platform-specific certificate validation as the default in 2.7 is
> far more palatable than backporting new public API.
>
>
> It’s entirely do-able. This is where I reveal just how long I’ve been
> fretting over this problem: https://pypi.python.org/pypi/certitude. Ignore
> the description, it’s wildly out-of-date: let me summarise the library
> instead.
>
> Certitude is a Python library that uses CFFI and Rust to call into the
> system certificate validation libraries on macOS and Windows using a single
> unified API. Under the covers it has a whole bunch of Rust code that
> translates from what OpenSSL can give you (a list of certificates in the
> peer cert chain in DER format) and into what those two operating systems
> expect. The Rust code for Windows is here[1] and is about as horrifying a
> chunk of Rust as you can imagine seeing (the Windows API does not translate
> very neatly into Rust so the word “unsafe” appears a lot), but it does
> appear to work, at least in the mainline cases and in the few tests I have.
> The macOS code is here[2] and is moderately less horrifying, containing no
> instances of the word “unsafe”.
>
> I lifted this approach from Chrome, because at the moment this is what they
> do: they use their custom fork of OpenSSL (BoringSSL) to do the actual TLS
> protocol manipulation, but hand the cert chain verification off to
> platform-native libraries on Windows and macOS.
>
> I have never productised this library because ultimately I never had the
> time to spend writing a sufficiently broad test-case to confirm to me that
> it worked in all cases. There are very real risks in calling these APIs
> directly because if you get it wrong it’s easy to fail open.
>
> It should be noted that right now certitude only works with, surprise,
> PyOpenSSL. Partly this is because the standard library does not expose
> SSL_get_peer_cert_chain, but even if it did that wouldn’t be enough as
> OpenSSL with VERIFY_NONE does not actually *save* 

Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE

2017-06-02 Thread Antoine Pitrou
On Thu, 1 Jun 2017 20:21:12 -0700
Larry Hastings  wrote:
> On 06/01/2017 02:50 AM, Antoine Pitrou wrote:
> > Another possible strategy is: allocate several arenas at once (using a
> > larger mmap() call), and use MADV_DONTNEED to relinquish individual
> > arenas.  
> 
> Thus adding a *fourth* layer of abstraction over memory we get from the OS?
> 
> block -> pool -> arena -> "multi-arena" -> OS

Not a layer of abstraction, just over-allocation.  You would
over-allocate arenas like you over-allocate a list object's elements
storage (though probably using a different over-allocation
strategy ;-)).  That would reduce the number of mmap() calls (though
not necessarily the number of munmap() or madvise() calls), and also
provide more opportunities for the kernel to allocate a large page.

But you would still handle individual arenas in the allocation code.

> Y'know, this might actually make things faster.  These multi-arenas 
> could be the dynamically growing thing Victor wants to try.  We allocate 
> 16mb, then carve it up into arenas (however big those are), then next 
> time allocate 32mb or what have you.

I hope those are not the actual numbers you're intending to use ;-)
I still think that allocating more than 1 or 2MB at once would be
foolish.  Remember this is data that's going to be carved up into
(tens of) thousands of small objects.  Large objects eschew the small
object allocator (not to mention that third-party libraries like Numpy
may be using different allocation routines when they allocate very
large data).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

2017-06-02 Thread Cory Benfield

> On 1 Jun 2017, at 20:59, Steve Dower  wrote:
> 
> On 01Jun2017 1010, Nathaniel Smith wrote:
>> I believe that for answering this question about the ssl module, it's really 
>> only Linux users that matter, since pip/requests/everyone else pushing for 
>> this only want to use ssl.MemoryBIO on Linux. Their plan on Windows/MacOS 
>> (IIUC) is to stop using the ssl module entirely in favor of new ctypes 
>> bindings for their respective native TLS libraries.
>> (And yes, in principle it might be possible to write new ctypes-based 
>> bindings for openssl, but (a) this whole project is already teetering on the 
>> verge of being impossible given the resources available, so adding any major 
>> extra deliverable is likely to sink the whole thing, and (b) compared to the 
>> proprietary libraries, openssl is *much* harder and riskier to wrap at the 
>> ctypes level because it has different/incompatible ABIs depending on its 
>> micro version and the vendor who distributed it. This is why manylinux 
>> packages that need openssl have to ship their own, but pip can't and 
>> shouldn't ship its own openssl for many hopefully obvious reasons.)
> 
> How much of a stop-gap would it be (for Windows at least) to override 
> OpenSSL's certificate validation with a call into the OS? This leaves most of 
> the work with OpenSSL, but lets the OS say yes/no to the certificates based 
> on its own configuration.
> 
> For Windows, this is under 100 lines of C code in (probably) _ssl, and while 
> I think an SChannel based approach is the better way to go long-term,[1] 
> offering platform-specific certificate validation as the default in 2.7 is 
> far more palatable than backporting new public API.

It’s entirely do-able. This is where I reveal just how long I’ve been fretting 
over this problem: https://pypi.python.org/pypi/certitude 
. Ignore the description, it’s wildly 
out-of-date: let me summarise the library instead.

Certitude is a Python library that uses CFFI and Rust to call into the system 
certificate validation libraries on macOS and Windows using a single unified 
API. Under the covers it has a whole bunch of Rust code that translates from 
what OpenSSL can give you (a list of certificates in the peer cert chain in DER 
format) and into what those two operating systems expect. The Rust code for 
Windows is here[1] and is about as horrifying a chunk of Rust as you can 
imagine seeing (the Windows API does not translate very neatly into Rust so the 
word “unsafe” appears a lot), but it does appear to work, at least in the 
mainline cases and in the few tests I have. The macOS code is here[2] and is 
moderately less horrifying, containing no instances of the word “unsafe”.

I lifted this approach from Chrome, because at the moment this is what they do: 
they use their custom fork of OpenSSL (BoringSSL) to do the actual TLS protocol 
manipulation, but hand the cert chain verification off to platform-native 
libraries on Windows and macOS.

I have never productised this library because ultimately I never had the time 
to spend writing a sufficiently broad test-case to confirm to me that it worked 
in all cases. There are very real risks in calling these APIs directly because 
if you get it wrong it’s easy to fail open.

It should be noted that right now certitude only works with, surprise, 
PyOpenSSL. Partly this is because the standard library does not expose 
SSL_get_peer_cert_chain, but even if it did that wouldn’t be enough as OpenSSL 
with VERIFY_NONE does not actually *save* the peer cert chain anywhere. That 
means even with PyOpenSSL the only way to get the peer cert chain is to hook 
into the verify callback and save off the certs as they come in, a gloriously 
absurd solution that is impossible with pure-Python code from the ssl module.

While this approach could work with _ssl.c, it ultimately doesn’t resolve the 
issue. It involves writing a substantial amount of new code which needs to be 
maintained by the ssl module maintainers. All of this code needs to be tested 
*thoroughly*, because python-dev would be accepting responsibility for the 
incredibly damaging potential CVEs in that code. And it doesn’t get python-dev 
out of the business of shipping OpenSSL on macOS and Windows, meaning that 
python-dev continues to bear the burden of OpenSSL CVEs, as well as the brand 
new CVEs that it is at risk of introducing.

Oh, and it can’t be backported to Python 2.7 or any of the bugfix-only Python 3 
releases, and as I just noted the ssl module has never made it possible to use 
this approach from outside CPython. So it’s strictly just as bad as the 
situation PEP 543 is in, but with more C code. Doesn’t sound like a winning 
description to me. ;)

Cory

[1]: 
https://github.com/Lukasa/rust-certitude/blob/master/rust-certitude/src/windows.rs
 

[2]: