[Python-Dev] Re: What is a public API?

2019-07-31 Thread Glyph

> On Jul 13, 2019, at 1:56 PM, Serhiy Storchaka  wrote:
> 
> I thought that the name in a module is in the public interface if:
> 
> * It doesn't start with an underscore and the module does not have __all__.
> * It is included in the module's __all__ list.
> * It is explicitly documented as a part of the public interface.
> 
> help() uses more complex rules, but it believes __all__ if it defined.
> 
> But seems there are different views on this.
> 
> * Raymond suggested to add an underscore the two dozens of names in the 
> calendar module not included in __all__.
> https://bugs.python.org/issue28292#msg347758
> 
> I do not like this idea, because it looks like a code churn and makes the 
> code less readable.
> 
> * Gregory suggests to document codecs.escape_decode() despite it is not 
> included in __all__.
> https://bugs.python.org/issue30588
> 
> I do not like this idea, because this function always was internal, its only 
> purpose was implementing the "string-escape" codec which was removed in 
> Python 3 (for reasons). In Python 3 it is only used for supporting the old 
> pickle protocol 0.
> 
> Could we strictly define what is considered a public module interface in 
> Python?

My apologies for not having read this very large thread before posting, but 
hopefully this small note won't be adding too much fuel to the fire:

Earlier this year I created an extremely small project called "publication" 
(https://pypi.org/project/publication/ <https://pypi.org/project/publication/>, 
https://github.com/glyph/publication <https://github.com/glyph/publication>) 
which attempts to harmonize the lofty ideal of "only the names explicitly 
mentioned in __all__ and in the documentation!" with the realpolitik of 
"anything I can reach where I don't have to type a leading underscore is fair 
game".  It simply removes the ability to externally invoke non-__all__-exported 
names without importing an explicitly named "._private" namespace.  It does not 
add any new syntactic idiom like a @public decorator (despite the aesthetic 
benefits of doing something like that) so that existing IDEs, type checkers, 
refactoring tools, code browsers etc can use the existing __all__ idiom and not 
break.  It intentionally doesn't try hard to hide the implementation; it's 
still Python and if you demonstrate that you know what you're doing you're 
welcome to all the fiddly internals, it just makes sure you know that that's 
what you're getting.

While I am perhaps infamously a stdlib contrarian ;-) this project is a single 
module with extremely straightforward, stable semantics which I would 
definitely not mind being copy/pasted into the stdlib wholesale, either under a 
different (private, experimental) name, or even under its current one if folks 
like it.  I'd be very pleased if this could solve the issue for the calendar 
module.

Thanks for reading,

-glyph___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/O3BKNPBV5AW7TS2C5NLPEI2TZGJOKDGP/


Re: [Python-Dev] Yearly PyPI breakage

2016-05-04 Thread Glyph
On May 3, 2016, at 9:15 PM, Stefan Krah <ste...@bytereef.org> wrote:
> 
>> [cut overlong post]
> 
> Glyph,
> 
> nice sneaky way to try to divert from the original issue.

The original issue, as I understood it, at the start of the thread was "which 
hoops I have to jump through this year in order to keep pip downloads working". 
 So I showed you the hoops.

Your question implied, to me, that you were not presently aware of how easy it 
is to simply build and upload your packages with sdist and twine.  After years 
and years of horrible setuptools bugs, it certainly seemed plausible to me that 
if you had long-standing experience with python packaging, you might have 
perhaps developed a well-deserved low opinion of them in the past.  Therefore, 
you might not be aware of relatively recent improvements to the packaging 
ecosystem which made this problem trivial to solve.

My intent was, therefore, simply to demonstrate that things have improved, and 
that this was not a hard thing for you to do and could be resolved with a 
minimum of fuss.

I confess that before posting I was made aware that you'd had some personality 
conflicts with some PyPI maintainers in the past.  But that sentence was about 
the extent and detail level of my understanding.  I was not aware to what 
extent, and the reason I jumped into this particular thread, when I rarely 
participate in python-dev, was that I hoped a simple explanation of the facts 
of the matter from someone you hadn't previously interacted with could address 
your concerns.

> Your whole post is invalidated by the simple fact that the URL was protected 
> by a hash (which I repeatedly asked to be upgraded to sha256).

Based only on previous discussion here, I had no way to know either of those 
things.  You didn't reference it in the post I was replying to, or in your 
original post.  And, as you say later, PyPI's download URL doesn't include the 
hash any more, so it wasn't there for me to observe.  (There were some manual 
instructions in your package description but no automated tooling will honor 
that.)  In any case, fragment hashes are not really a suitable general-purpose 
mechanism as they are only honored by specific tools (like pip) whereas HTTPS 
verification ought to be universally supported, so IMHO it is a good thing that 
PyPI is discouraging their use for this purpose.

> This was the official scheme promoted by PEP-438, which you should know.  But 
> of course your actual intention here is character assassination, pretending 
> to "rescue" cdecimal

In the "overlong" post that you elided, I specifically said I didn't intend to 
maintain it for long. If this wasn't clear, what I meant to say by that comment 
was that I would keep the index entry available until you had the opportunity 
to upload some sdists and wheels yourself to PyPI.  If you don't intend to, I 
am not the right person to "rescue" the package; someone else who is more 
invested in cdecimal should provide an alternate PyPI entry, or take over this 
one.

> and trying to divert from the fact that
> the transition to PEP 470 was handled suboptimally.

I don't see any need to divert attention from this fact, because you appear to 
be in a minority of one in considering it so.

> The very reason for this thread is that the security was silently disabled 
> WITHOUT me getting a notification.  What is on PyPI *now* is not what I 
> configured!

If that was the reason for the thread, you would have been better served by 
making that specific complaint rather than asking for information, and then 
yelling at the people who provided it to you.  You might also consider 
reporting these issues to an appropriate forum, since python-dev is not the 
bugtracker for PyPI.  You can find that here: 
<https://bitbucket.org/pypa/pypi/issues 
<https://bitbucket.org/pypa/pypi/issues>>.  You might also want to continue 
this thread on distutils-sig; I'm sorry for contributing to the noise on 
python-dev, but I thought getting a high-profile package such as cdecimal 
integrated into the modern packaging ecosystem would be worth the off-topic 
digression.

> [various spurious and irrelevant ad-hominem attacks redacted]


Perhaps naively, given the level of hostility on display here, I still hope 
that you might see the wisdom in simply uploading build artifacts to PyPI.  But 
I won't try to convince you further.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Yearly PyPI breakage

2016-05-03 Thread Glyph
On May 3, 2016, at 2:38 PM, Stefan Krah <ste...@bytereef.org> wrote:
> 
> But making them completely unreachable does not increase reliability. :)

But it does increase security.

The other motivation, besides reliability, listed in this section 
<https://www.python.org/dev/peps/pep-0470/#my-users-have-a-worse-experience-with-this-pep-than-before-how-do-i-explain-that>,
 is that:

"transparently including external links [is] a security hazard (given that in 
most cases it allowed a MITM to execute arbitrary Python code on the end users 
machine)".

And, indeed, the URL presently listed on PyPI for the cdecimal upload is an 
unverified http URL.  This means that any evil barista with access to a 
coffee-shop wifi router could instantly execute user-privileged code on any 
Python programmer's laptop if they were to `pip install´ this externally hosted 
package, which is one of the reasons why neither `pip´ nor `pypi´ allow such a 
thing any more.

Please believe me when I say I do not mean the following to be insulting - 
information security is incredibly confusing, difficult, and rapidly evolving, 
and I don't blame you for getting it wrong - but maintaining a popular package 
in this way is dangerously irresponsible.  There are solid social reasons to 
centralize the control of the default package repository in the hands of 
dedicated experts who can scale their security expertise to a large audience, 
so that package authors like you and I don't need to do this in order to 
prevent Python from gaining a reputation as a vector for malware; this package 
is a case in point.

Separately from the issue of how PyPI works, even if you have some reason you 
need to host it externally (which I seriously doubt), please take the trouble 
to set up a server with properly verified TLS, or use a '.github.io' hostname 
that can be verified that way.

In the meanwhile, just to demonstrate that it's a trivial amount of work to 
just host it on PyPI, I checked out this package via a verified mechanism ("git 
clone https://github.com/bytereef/bytereef.github.io;) and created a new 
pypi-cdecimal package <https://pypi.python.org/pypi/pypi-cdecimal 
<https://pypi.python.org/pypi/pypi-cdecimal>>, via editing the setup.py to 
change the name, 'python setup.py register', 'python setup.py sdist', 'pip 
wheel' (for some reason direct 'python setup.py bdist_wheel' didn't work), and 
'twine upload'.  `pip install pypi-cdecimal´ should now work and get you an 
importable `cdecimal´, and if you happen to be lucky enough to run the same OS 
version I am, you won't even need to build C code.  cdecimal users may wish to 
retrieve it via this mechanism until there's a secure way to get the proper 
upstream distribution.

If anyone wants package-index access to this name to upload Windows or 
manylinux wheels just let me know; however, as this is just a proof of concept, 
I do not intend to maintain it long-term.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 481 - Migrate Some Supporting Repositories to Git and Github

2014-11-30 Thread Glyph

 On Nov 30, 2014, at 11:17, Chris Angelico ros...@gmail.com wrote:
 
 On Sun, Nov 30, 2014 at 8:54 PM, Nick Coghlan ncogh...@gmail.com wrote:
 On 30 November 2014 at 15:23, Chris Angelico ros...@gmail.com wrote:
 Python is already using quite a bit of non-free software in its
 ecosystem. The Windows builds of CPython are made with Microsoft's
 compiler, and the recent discussion about shifting to Cygwin or MinGW
 basically boiled down to but it ought to be free software, and that
 was considered not a sufficiently strong argument. In each case, the
 decision has impact on other people (using MSVC for the official
 python.org installers means extension writers need to use MSVC too;
 and using GitHub means that contributors are strongly encouraged,
 possibly required, to use GitHub); so why is it acceptable to use a
 non-free compiler, but not acceptable to use a non-free host?
 
 Relying on non-free software to support users of a non-free platform
 is rather different from *requiring* the use of non-free software to
 participate in core Python community design processes.
 
 But what non-free software is required to use the community design
 processes? The GitHub client is entirely optional; I don't use it, I
 just use git itself. Using a free client to access a proprietary
 server isn't the same as using non-free software.

Also keep in mind that unless you are using a very esoteric hardware setup and 
dedicated leased lines that you purchased yourself, you are likely to be using 
copyrighted, patented, proprietary software at a number of levels:

the microcode implementation in your CPU
the firmware in your GPU
the firmware in your network card
the boot code (e.g.: BIOS or EFI implementation) of your motherboard
the firmware in your router
or the firmware in your cable or DSL modem, if you thought to get a free router 
with OpenWRT or something in it
the firmware in your ISP's router
the firmware in the backbone's routers
the firmware in the PSF's ISP's routers

Does this sound like ridiculous nitpicking?  Of course it does!  If you refused 
to use all that stuff you just wouldn't be able to access the internet at all, 
regardless of your personal choices.  Most layers of this stack are _less_ 
constrained to open standards and open data access than Github, and can require 
layers and layers of additional proprietary software or reverse engineering 
(ask anyone who has tried to play a video game on a Linux computer what the 
experience is like without gobs of proprietary blobs from nvidia or ATI).

And as the story of BitKeeper shows, if a proprietary platform decides to do 
something bad, if the cost of migration is within your means, you can just 
leave.  This is far more true of Github than Bitkeeper - Linux had to create a 
totally new VCS to migrate off of that, we just have to install Trac or Roundup 
or something again. (Which, as per the presently-proposed PEP, we don't even 
have to install them again, just change some configuration to put a few 
repositories back.)

The monoculture about Github concerns me.  I also have concerns about the 
long-term consequences of not having an all-free-software stack.  But focusing 
on avoiding services like Github at this point in history is just a gigantic 
waste of time; it's resolving dependencies in the wrong order.

The only reason to avoid Github is ideological purity, and even then it's not 
even purity because you still have to accept this other ideological 
contamination.  Except, what about other ideological concepts that are 
important to the Python core developers, like equitable representation for 
women and minorities, and compassionate and polite interactions within the 
community?  Clearly we can't require the use of Linux.  If we treat these 
ideals as an equivalent priority as free software (even if we ignore the 
invisible dependencies like the list I just made above), then there is 
literally no software left that we can use to develop Python. Linux kernel and 
GNU low-level user-land development are a perpetual cesspit, and their leaders 
serve as a continuous embarrassment to our community.

And speaking of equitable representation, one proven technique we have learned 
for dealing with that problem is making it for newcomers of all stripes to 
access the community.  Like it or not, Github's popularity means that it's a 
place where most newcomers to programming are learning to use version control 
and bug tracking.  This is a network effect that can have a real impact on 
people's lives.  Requiring newcomers to learn our weird technology religion 
before they can contribute creates a real barrier to entry, which in turn makes 
our community more insular and homogenous.

Some days you get the Git, and some days the Github gets you.  The sooner we, 
as a community and a culture, can accept this and move on, the more time will 
be available to actually build replacements for these things.

-glyph

Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-09-02 Thread Glyph Lefkowitz
On Aug 29, 2014, at 7:44 PM, Alex Gaynor alex.gay...@gmail.com wrote:
  Disabling verification entirely externally to the program, through a CLI flag
  or environment variable. I'm pretty down on this idea, the problem you hit is
  that it's a pretty blunt instrument to swing, and it's almost impossible to
  imagine it not hitting things it shouldn't; it's far too likely to be used in
  applications that make two sets of outbound connections: 1) to some internal
  service which you want to disable verification on, and 2) some external
  service which needs strong validation. A global flag causes the latter to
  fail silently when subjected to a MITM attack, and that's exactly what we're
  trying to avoid. It also makes things much harder for library authors: I
  write an API client for some API, and make TLS connections to it. I want
  those to be verified by default. I can't even rely on the httplib defaults,
  because someone might disable them from the outside.


I would strongly recommend against such a mechanism.

For what it's worth, Twisted simply unconditionally started verifying 
certificates in 14.0 with no disable switch, and (to my knowledge) literally 
no users have complained.

Twisted has a very, very strict backwards compatibility policy.  For example, I 
once refused to accept the deletion of a class that raised an exception upon 
construction, on the grounds that someone might have been inadvertently 
importing that class, and they shouldn't see an exception until they've seen a 
deprecation for one release.

Despite that, we classified failing to verify certificates as a security bug, 
and fixed it with no deprecation period.  When users type the 's' after the 'p' 
and before the ':' in a URL, they implicitly expect browser-like certificate 
verification.

The lack of complaints is despite the fact that 14.0 has been out for several 
months now, and, thanks to the aforementioned strict policy, users tend to 
upgrade fairly often (since they know they can almost always do so without fear 
of application-breaking consequences).  According to PyPI metadata, 14.0.0 has 
had 273283 downloads so far.

Furthermore, disable verification is a nonsensical thing to do with TLS.  
select a trust root is a valid configuration option, and OpenSSL already 
provides it via the SSL_CERT_DIR environment variable, so there's no need for 
Python to provide anything beyond that.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-09-02 Thread Glyph Lefkowitz

On Sep 2, 2014, at 4:01 PM, Nick Coghlan ncogh...@gmail.com wrote:

 
 On 3 Sep 2014 08:18, Alex Gaynor alex.gay...@gmail.com wrote:
 
  Antoine Pitrou solipsis at pitrou.net writes:
 
  
   And how many people are using Twisted as an HTTPS client?
   (compared to e.g. Python's httplib, and all the third-party libraries
   building on it?)
  
 
  I don't think anyone could give an honest estimate of these counts, however
  there's two factors to bare in mind: a) It's extremely strongly recommended 
  to
  use requests to make any HTTP requests precisely because httplib is 
  negligent
  in certificate and hostname checking by default, b) We're talking about
  Python3, which has fewer users than Python2.
 
 Creating *new* incompatibilities between Python 2  Python 3 is a major point 
 of concern. One key focus of 3.5 is *reducing* barriers to migration, and 
 this PEP would be raising a new one.
 
No.  Providing the security that the user originally asked for is not a 
backwards incompatible change.  It is a bug fix.  And believe me: I care a 
_LOT_ about reducing barriers to migration.  This would not be on my list of 
the top 1000 things that make migration difficult.
 It's a change worth making, but we have time to ensure there are easy ways to 
 do things like skipping cert validation, or tolerate expired certificates.
 

The API already supports both of these things.  What I believe you're 
implicitly saying is that there needs to be a way to do this without editing 
code, and... no, there really doesn't.  Not to mention the fact that you could 
already craft a horrific monkeypatch to allow operators to cause the ssl module 
to malfunction by 'pip install'ing a separate package, which is about as 
supported as this should be.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-09-02 Thread Glyph Lefkowitz

On Sep 2, 2014, at 4:28 PM, Nick Coghlan ncogh...@gmail.com wrote:

 On 3 Sep 2014 09:08, David Reid dr...@dreid.org wrote:
 
  Nick Coghlan ncoghlan at gmail.com writes:
 
   Creating *new* incompatibilities between Python 2  Python 3 is a major 
   point
   of concern.
 
  Clearly this change should be backported to Python2.
 
 Proposing to break backwards compatibility in a maintenance release (...)
 

As we keep saying, this is not a break in backwards compatibility, it's a bug 
fix.  Yes, systems might break, but that breakage represents an increase in 
security which may well be operationally important.  Not everyone with a 
working application has the relevant understanding an expertise to know that 
Python's HTTP client is exposing them to surveillance.  These applications 
should break. That is the very nature of the fix.  It is not a compatibility 
break that the system starts correctly rejecting invalid connections.

By way of analogy, here's another kind of breach in security: an arbitrary 
remote code execution vulnerability in XML-RPC.  I think we all agree that any 
0day RCE vulnerabilities in Python really ought to be fixed and could be 
legitimately included without worrying about backwards compatibility breaks.  
(At least... gosh, I hope so.)

Perhaps this arbitrary remote execution looks harmless; the use of an eval() 
instead of an int() someplace.  Perhaps someone discovered that they can do 3 
+ 4 in their XML-RPC and the server does the computation for them.  Great!  
They start relying on this in their applications to use symbolic values in 
their requests instead of having explicit enumerations.  This can save you 
quite a bit of code!

When the RCE is fixed, this application will break, and that's fine.  In fact 
that's the whole point of issuing the fix, that people will no longer be able 
to make arbitrary computation requests of your server any more.  If that 
server's maintainer has the relevant context and actually wants the XML-RPC 
endpoint to enable arbitrary RCE, they can easily modify their application to 
start doing eval() on the data that they received, just as someone can easily 
modify their application to intentionally disable all connection security.  
(Let's stop calling it certificate verification because that sounds like some 
kind of clerical detail: if you disable certificate verification, TLS 
connections are unauthenticated and unidentified and therefore insecure.)

For what it's worth, on the equivalent Twisted change, I originally had just 
these concerns, but my mind was changed when I considered what exactly the 
user-interface ramifications were for people typing that 's' for 'secure' in 
URLs.  I was convinced, and we made the change, and there have been no ill 
effects that I'm aware of as a result.  In fact, there has been a renewed 
interest in Twisted for HTTP client work, because we finally made security work 
more or less like it's supposed to, and the standard library is so broken.

I care about the health of the broader Python community, so I will passionately 
argue that this change should be made, but for me personally it's a lot easier 
to justify that everyone should use Twisted (at least since 14+) because 
transport security in the stdlib is such a wreck and even if it gets fixed it's 
going to have easy options to turn it off unilaterally so your application can 
never really be sure if it's getting transport security when it's requesting 
transport security.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Language Summit Follow-Up

2014-05-28 Thread Glyph Lefkowitz
At the language summit, Alex and I volunteered to put together some 
recommendations on what changes could be made to Python (the language) in order 
to facilitate a smoother transition from Python 2 to Python 3.  One of the 
things that motivated this was the (surprising, to us) consideration that 
features like ensurepip might be added to the future versions of the 2.7 
installers from python.org.

The specific motivations for writing this are:

Library maintainers have a rapidly expanding matrix that requires an increasing 
number of branches to satisfy.
People with large corporate codebases absolutely cannot port all at once.

If you don't have perfect test coverage then you can't make any progress.  So 
these changes are intended to make porting from python 2 to python 3 more 
guided and incremental.  We believe that these attributes are necessary.

We would like to stress that we don't believe anything on this list is as 
important as the continuing efforts that everyone in the broader ecosystem is 
making.  If you just want to ease the transition by working on anything at all, 
the best use of your time right now is porting 
https://warehouse.python.org/project/MySQL-python/ to Python 3. :)

Nevertheless there are some things that the language and CPython could do.

Unfortunately we had to reject any proposal that involved new __future__ 
imports, since unknown __future__ imports are un-catchable SyntaxErrors.

Here are some ideas for Python 2.7+.

Add ensurepip to the installers.  Having pip reliably available increases the 
availability of libraries that help with porting, and will generally strengthen 
the broader ecosystem in the (increasingly long) transition period.
Add some warnings about python 3 compatibility.
It should at least be possible to get a warning for every single implicit 
string coercion.
Old-style classes.
Old-style division.
Print statements.
Old-style exception syntax.
buffer().
bytes(memoryview(b'abc'))
Importing old locations from the stdlib (see point 4.)
Long integer syntax.
Use of variables beyond the lifetime of an 'except Exception as e' block or a 
list comprehension.
Backport 'yield from' to allow people to use Tulip and Tulip-compatible code, 
and to facilitate the development of Tulip-friendly libraries and a Tulip 
ecosystem.  A robust Tulip ecosystem requires the participation of people who 
are not yet using Python 3.
Add aliases for the renamed modules in the stdlib.  This will allow people to 
just write python 3 in a lot more circumstances.
(re-)Enable warnings by default, including enabling -3 warnings.  Right now all 
warnings are silent by default, which greatly reduces discoverability of future 
compatibility issues.  I hope it's not controversial to say that most new 
Python code is still being written against Python 2.7 today; if people are 
writing that code in such a way that it's not 3-friendly, it should be a more 
immediately noticeable issue.
Get rid of 2to3. Particularly, of any discussion of using 2to3 in the 
documentation.  More than one very experienced, well-known Python developer in 
this discussion has told me that they thought 2to3 was the blessed way to port 
their code, and it's no surprise that they think so, given that the first 
technique https://docs.python.org/3/howto/pyporting.html mentions is still 
2to3.  We should replace 2to3 with something like 
https://github.com/mitsuhiko/python-modernize. 2to3 breaks your code on 
python 2, and doesn't necessarily get it running on python 3.  A more 
conservative approach that reduced the amount of work to get your code 2/3 
compatible but was careful to leave everything working would be a lot more 
effective.
Add a new 'bytes' type that actually behaves like the Python 3 bytes type 
(bytes(5)).

We have rejected any changes for Python 3.5, simply because of the extremely 
long time to get those features into users hands.  Any changes for Python 3 
that we're proposing would need to get into a 3.4.x release, so that, for 
example, they can make their way into Ubuntu 14.04 LTS.

Here are some ideas for Python 3.4.x:

Usage of Python2 style syntax (for example, a print statement) or stdlib module 
names (for example, 'import urllib2') should result in a specific, informative 
warning, not a generic SyntaxError/ImportError.  This will really help new 
users.
Add 'unicode' back as an alias for 'str'.  Just today I was writing some 
documentation where I had to resort to some awkward encoding tricks just to get 
a bytes object out without explaining the whole 2/3 dichotomy in some unrelated 
prose.

We'd like to thank all the individuals who gave input and feedback in creating 
this list.

-glyph  Alex Gaynor

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Simple IDLE issues to commit before Python 2.7.4 release in two weeks on 4/6/2013

2013-03-25 Thread Glyph

On Mar 25, 2013, at 1:40 PM, Benjamin Peterson benja...@python.org wrote:

 ... Assuming PEP 343 becomes policy ...

Are you sure you got this PEP number right?  The 'with' statement?

http://www.python.org/dev/peps/pep-0343/

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] About issue 6560

2013-03-16 Thread Glyph
On Mar 14, 2013, at 3:48 PM, Martin v. Löwis mar...@v.loewis.de wrote:

 Am 14.03.13 15:15, schrieb Ani Sinha:
 I was looking into a mechanism to get the aux fields from recvmsg() in
 python and I came across this issue. Looks like this feature was added
 in python 3.3. Is there any reason why this feature was not added for
 python 2.7?
 
 Most certainly: Python 2.7 (and thus Python 2) is feature-frozen; no
 new features can be added to it. People wanting new features need to
 port to Python 3.


Or you can use Twisted: 
http://twistedmatrix.com/trac/browser/trunk/twisted/python/sendmsg.c

That module ought to have no dependencies outside of Twisted.

We only use it for passing file descriptors between processes, but I believe it 
should be able to deal with whatever other types of auxiliary data that you 
need from recvmsg; if not, please file a bug (at http://twistedmatrix.com/).

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] built-in Python test runner (was: Python Language Summit at PyCon: Agenda)

2013-03-05 Thread Glyph
On Mar 4, 2013, at 11:13 PM, Robert Collins robe...@robertcollins.net wrote:

 In principle maybe. Need to talk with the trial developers, nose
 developers, py.test developers etc - to get consensus on a number of
 internal API friction points.

Some of trial's lessons might be also useful for the stdlib going forward, 
given the hope of doing some event-loop stuff in the core.

But, I feel like this might be too much to cover at the language summit; there 
could be a test frameworks summit of its own, of about equivalent time and 
scope, and we'd still have a lot to discuss.

Is there a unit testing SIG someone from Twisted ought to be a member of, to 
represent Trial, and to get consensus on these points going forward?

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] peps: Pre-alpha draft for PEP 435 (enum). The name is not important at the moment, as

2013-02-26 Thread Glyph
On Feb 26, 2013, at 5:25 AM, Eli Bendersky eli...@gmail.com wrote:

 Glyph, thanks for the input. I mentioned Twisted because in its code I found 
 a number of places with simple string enumerations used to represent state. I 
 was not aware of twisted.python.constants, but it doesn't appear that this 
 module is used  at least in the places I checked.

Quite so.  twisted.python.constants was created because we made the same 
observation that you did.  Hopefully, more of these protocols will be 
transitioned to make use of twisted.python.constants internally.

 In general, many protocols have some state instance var that's usually just 
 a string, using either predefined constants or direct string literals.

Indeed.  This idiom varies considerably.  Another thing that Twisted needs is a 
mechanism for explicitly building state-machine, but that's a discussion for 
another day.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] peps: Pre-alpha draft for PEP 435 (enum). The name is not important at the moment, as

2013-02-25 Thread Glyph

On Feb 25, 2013, at 12:32 PM, Barry Warsaw ba...@python.org wrote:

 Dumb question, but are flufl.enums ordered?  That's also an important use
 case.
 
 Kind of.  Ordered comparisons are explicitly not supported, but iteration over
 the Enum is guaranteed to be returned in int-value order.

Sorry to jump in to a random leaf of this thread, but there is such a barrage 
here I cannot find the beginning :).

I can see in http://www.python.org/dev/peps/pep-0435/#acknowledgments that 
Twisted is mentioned; it should probably reference 
https://twistedmatrix.com/documents/current/api/twisted.python.constants.html 
and https://twistedmatrix.com/documents/current/core/howto/constants.html 
since we actually implemented a thing as well.

(You can order constants by sorting them; off the top of my head, 
NamedConstant, ValueConstant, and FlagConstant all probably behave differently.)

-g___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] I was just thinking that os.path could use some love...

2013-01-30 Thread Glyph
On Jan 30, 2013, at 2:01 PM, Cameron Simpson c...@zip.com.au wrote:

 Speaking for myself, I've been having some usefulness with making URL
 objects that are subclasses of str. That lets me pass them to all the
 things that already expect strs, while still having convenience methods.

str subclasses are problematic.  One issue is that it will still allow for 
invalid manipulations.  If you prohibit them, then manipulations that take 
multiple steps will be super inconvenient.  If you allow them, then you end up 
with half-formed values that will error out sometimes, or generate corrupt data 
that shouldn't be allowed to exist (trivial example; a NUL character in the 
middle of a file path).  Also, automatic coercion will sometimes surprise you 
and give you a value which is of the wrong type if you forget a method or two.

Also URL and file paths have a common interface, but are not totally the same.

Basically, everybody wants to say composition is better than inheritance, 
except for *this* case, where inheritance seems super convenient.  That's how 
it gets you!  Inheritance _is_ super convenient, but it's also super confusing. 
 Resist the temptation :-).

Once again (I see my previous reply went straight to the sender, not the whole 
list) I recommend https://launchpad.net/filepath as an abstraction that has 
worked very well in a wide variety of situations.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted

2013-01-09 Thread Glyph
On Jan 8, 2013, at 9:14 PM, Guido van Rossum gu...@python.org wrote:

 But which half? A socket is two independent streams, one in each
 direction. Twisted uses half_close() for this concept but unless you
 already know what this is for you are left wondering which half. Which
 is why I like using 'write' in the name.

I should add, if you don't already know what this means you really shouldn't be 
trying to do it ;-).

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted

2013-01-04 Thread Glyph

On Jan 4, 2013, at 8:51 PM, Guido van Rossum gu...@python.org wrote:

 On Fri, Jan 4, 2013 at 8:19 PM, Glyph gl...@twistedmatrix.com wrote:
 
 On Jan 4, 2013, at 3:56 PM, Guido van Rossum gu...@python.org wrote:
 
 On Mon, Dec 24, 2012 at 2:58 PM, Glyph gl...@twistedmatrix.com wrote:
 In my humble (but entirely, verifiably correct) opinion, thinking of this 
 as
 a default is propagating a design error in the BSD sockets API.  Datagram
 and stream sockets have radically different semantics.  In Twisted,
 dataReceived and datagramReceived are different methods for a good
 reason.  Again, it's very very easy to fall into the trap of thinking that 
 a
 TCP segment is a datagram and writing all your application code as if it
 were.  After all, it probably works over localhost most of the time!  This
 difference in semantics mirrored by a difference in method naming has 
 helped
 quite a few people grok the distinction between streaming and datagrams 
 over
 the years; I think it would be a good idea if Tulip followed suit.
 
 Suppose PEP 3156 / Tulip uses data_received() for streams and
 datagram_received() for datagram protocols (which seems reasonable
 enough), what API should a datagram transport have for sending
 datagrams? write_datagram() and write_datagram_list()?
 
 Twisted just have a different method called write() which has a different 
 signature (data, address).  Probably write_datagram is better.  Why 
 write_datagram_list though?  Twisted's writeSequence is there to provide the 
 (eventual) opportunity to optimize by writev; since datagrams are always 
 sent one at a time anyway, write_datagram_list would seem to be a very minor 
 optimization.
 
 That makes sense (you can see I haven't tried to use UDP in a long time :-).
 
 Should write_datagram() perhaps return a future? Or is there still a
 use case for buffering datagrams?

There's not much value in returning a future even if you don't buffer.  UDP 
packets can be dropped, and there's no acknowledgement from the remote end 
either when they're received or when they're dropped.  You can get a couple 
hints from ICMP, but you can't rely on it, because lots of networks just dumbly 
filter it.

Personally I think its flow control should work the same way as a TCP stream 
just for symmetry, but the only time that this becomes important in an 
application is when you're actually saturating your entire outbound network, 
and you need to notice and slow down.  Returning a future would let you do this 
too, but might mislead users into thinking that once write_datagram completes, 
the datagram is sent and the other side has it, which is another pernicious 
idea it's hard to disabuse people of.

-glyph
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted

2012-12-24 Thread Glyph
On Dec 21, 2012, at 1:10 PM, Guido van Rossum gu...@python.org wrote:

   The transport is free to buffer the bytes, but it must eventually
   cause the bytes to be transferred to the entity at the other end, and
   it must maintain stream behavior. That is, t.write(b'abc');
   t.write(b'def') is equivalent to t.write(b'abcdef')
 
  I think this is a bad idea. The kernel's network stack should do the
  buffering (and choose appropriate algorithms for that), not the
  user-level framework. The transport should write the bytes as soon as
  the fd is ready for writing, and it should write the same chunks as
  given by the user, not a concatenation of them.
 
 I asked Glyph about this. It depends on the OS... Mac syscalls are so slow 
 that it is better to join in user space. This should really be up to the 
 transport, although for stream transports the given equivalency should 
 definitely hold.
 
It's not so much that mac syscalls are slow as that syscalls are not free, 
and the cost varies. Older versions of MacOS were particularly bad.  Some 
versions of Linux had bizarre regressions in the performance of send() or 
recv() or pipe().  The things that pass for syscalls on Windows can be 
particularly catastrophically slow (although this is practically a 
consideration for filesystem APIs, not socket APIs, who knows what this the 
future will hold).

There are a number of other reasons why this should be this way as well.

User-space has the ability to buffer indefinitely, and the kernel does not.  
Sometimes, send() returns a truncated value, and you have to deal with this.  
Since you've allocated the memory for the value you're calling write() with 
anyway, you might as well stash it away in the framework.  The alternative is 
to let every application implement - and by implement, I mean screw up - a 
low-performance buffering implementation.
User-space has more information about the type of information being sent.  If 
the user does write() write() write() within one loop iteration, the framework 
can hypothetically optimize that into a single syscall using scatter-gather 
I/O.  (Fun fact: we tried this, and it turns out that some implementations of 
scatter-gather I/O are actually *slower* than naive repeated calls; information 
like this should, again, be preserved within the framework.)
In order to preserve compatibility with other systems (Twisted, Tornado, et. 
al.), the framework must be within its rights to do the buffering itself, even 
if it actually does exactly what you're suggesting because that happens to be 
better for performance in some circumstances.  Choosing different buffering 
strategies for different applications is an important tuning option.
Applications which appear to work in some contexts if the boundaries of data 
passed to send() are exactly the same as the boundaries of the data sent to 
write() should not be coddled; this just makes them harder to debug later.  
They should be broken as soon as possible.  This is a subtle, pernicious and 
nearly constant error that people new to networking make and the sooner it 
surfaces, the better.  The segments passed to data_received() should be as 
different as possible from the segments passed to write().
  Besides, it would be better if transports weren't automatically
  *streaming* transports. There are connected datagram protocols, such as
  named pipes under Windows (multiprocessing already uses non-blocking
  Windows named pipes).
 
 I think we need to support datagrams, but the default ought to be stream.
 
In my humble (but entirely, verifiably correct) opinion, thinking of this as a 
default is propagating a design error in the BSD sockets API.  Datagram and 
stream sockets have radically different semantics.  In Twisted, dataReceived 
and datagramReceived are different methods for a good reason.  Again, it's 
very very easy to fall into the trap of thinking that a TCP segment is a 
datagram and writing all your application code as if it were.  After all, it 
probably works over localhost most of the time!  This difference in semantics 
mirrored by a difference in method naming has helped quite a few people grok 
the distinction between streaming and datagrams over the years; I think it 
would be a good idea if Tulip followed suit.

-glyph


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted

2012-12-24 Thread Glyph

On Dec 21, 2012, at 1:10 PM, Guido van Rossum gu...@python.org wrote:

   TBD: Need an interface to wait for the first of a collection of Futures.
 
  Have you looked at Twisted's DeferredList?
  http://twistedmatrix.com/documents/12.1.0/api/twisted.internet.defer.DeferredList.html
 
 No, I am trying to stay away from them.
 

Those who do not understand Deferreds are doomed to re-implement them poorly 
;-).  (And believe me, I've seen more than a few poor re-implementations at 
this point...)

-g___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3145 (With Contents)

2012-12-20 Thread Glyph

On Dec 19, 2012, at 7:46 PM, anatoly techtonik techto...@gmail.com wrote:
  
 On *nix it really shouldn't be select.  select cannot wait upon a file 
 descriptor whose value is greater than FD_SETSIZE, which means it sets a 
 hard (and small) limit on the number of things that a process which wants to 
 use this facility can be doing.
 
 I didn't know that. Should a note be added to 
 http://docs.python.org/2/library/select ?
 
 The note that should be added there is simply you should know how the select 
 system call works in C if you want to use this module.
 
 Why spreading FUD if it is possible to define a good entrypoint for those who 
 want to learn, but don't have enough time? Why not to say directly that 
 select interface is outdated?

It's not FUD.  If you know how select() works in C, you may well want to call 
it.  It's the most portable multiplexing API, although it has a number of 
limitations.  Really, what most users in this situation ought to be using is 
Twisted, but it seems there is not sufficient interest to bundle Twisted's core 
in the stdlib.  However, the thing Guido is working on lately may be 
interoperable enough with Twisted that you can upgrade to it more easily in 
future versions of Python, so one day it may be reasonable to say select is 
outdated.

(Maybe not though.  It's a good thing nobody told me that select was deprecated 
in favor of asyncore.)

 On the other hand, if you hard-code another arbitrary limit like this into 
 the stdlib subprocess module, it will just be another great reason why 
 Twisted's spawnProcess is the best and everyone should use it instead, so be 
 my guest ;-).
 
 spawnProcess requires a reactor. This PEP is an alternative for the 
 proponents of green energy. =)
 
 Do you know what happens when you take something that is supposed to be 
 happening inside a reactor, and then move it outside a reactor?  It's not 
 called green energy, it's called a bomb ;-).
 
 The biggest complain about nuclear physics is that to understand what's going 
 on it should have been gone 3D long ago. =) I think Twisted needs to organize 
 competition on the best visualization of underlying concepts. It will help 
 people grasp the concepts behind and different problems much faster (as well 
 as gain an ability to compare different reactors).

I would love for someone to do this, of course, but now we're _really_ off 
topic.

-glyph


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3145 (With Contents)

2012-12-19 Thread Glyph

On Dec 19, 2012, at 2:14 PM, anatoly techtonik techto...@gmail.com wrote:

 On Sun, Dec 9, 2012 at 7:14 AM, Glyph gl...@twistedmatrix.com wrote:
 On Dec 7, 2012, at 5:10 PM, anatoly techtonik techto...@gmail.com wrote:
 
 What about reading from other file descriptors?  subprocess.Popen allows 
 arbitrary file descriptors to be used.  Is there any provision here for 
 reading and writing non-blocking from or to those?
 
 On Windows it is WriteFile/ReadFile and PeekNamedPipe. On Linux it is 
 select. Of course a test is needed, but why it should not just work?
 
 
 This is exactly why the provision needs to be made explicitly.
 
 On Windows it is WriteFile and ReadFile and PeekNamedPipe - unless the handle 
 is a socket in which case it needs to be WSARecv.  Or maybe it's some other 
 weird thing - like, maybe a mailslot - and you need to call a different API.
 
 IIRC on Windows there is no socket descriptor that can be used as a file 
 descriptor. Seems reasonable to limit the implementation to standard file 
 descriptors in this platform.

Via the documentation of ReadFile: 
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365467(v=vs.85).aspx

hFile [in]
A handle to the device (for example, a file, file stream, physical disk, 
volume, console buffer, tape drive, socket, communications resource, mailslot, 
or pipe). (...) For asynchronous read operations, hFile can be any handle that 
is opened with the FILE_FLAG_OVERLAPPED flag by the CreateFilefunction, or a 
socket handle returned by the socket or accept function.

(emphasis mine).

So, you can treat sockets as regular files in some contexts, and not in others. 
 Of course there are other reasons to use WSARecv instead of ReadFile 
sometimes, which is why there are multiple functions.

 On *nix it really shouldn't be select.  select cannot wait upon a file 
 descriptor whose value is greater than FD_SETSIZE, which means it sets a hard 
 (and small) limit on the number of things that a process which wants to use 
 this facility can be doing.
 
 I didn't know that. Should a note be added to 
 http://docs.python.org/2/library/select ?

The note that should be added there is simply you should know how the select 
system call works in C if you want to use this module.

 I also thought that poll acts like, well, a polling function - eating 100% 
 CPU while looping over inputs over and over checking if there is something to 
 react to.

Nope.  Admittedly, the naming is slightly misleading.

 On the other hand, if you hard-code another arbitrary limit like this into 
 the stdlib subprocess module, it will just be another great reason why 
 Twisted's spawnProcess is the best and everyone should use it instead, so be 
 my guest ;-).
 
 spawnProcess requires a reactor. This PEP is an alternative for the 
 proponents of green energy. =)

Do you know what happens when you take something that is supposed to be 
happening inside a reactor, and then move it outside a reactor?  It's not 
called green energy, it's called a bomb ;-).

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3145 (With Contents)

2012-12-08 Thread Glyph

On Dec 7, 2012, at 5:10 PM, anatoly techtonik techto...@gmail.com wrote:

 What about reading from other file descriptors?  subprocess.Popen allows 
 arbitrary file descriptors to be used.  Is there any provision here for 
 reading and writing non-blocking from or to those?
 
 On Windows it is WriteFile/ReadFile and PeekNamedPipe. On Linux it is select. 
 Of course a test is needed, but why it should not just work?


This is exactly why the provision needs to be made explicitly.

On Windows it is WriteFile and ReadFile and PeekNamedPipe - unless the handle 
is a socket in which case it needs to be WSARecv.  Or maybe it's some other 
weird thing - like, maybe a mailslot - and you need to call a different API.

On *nix it really shouldn't be select.  select cannot wait upon a file 
descriptor whose value is greater than FD_SETSIZE, which means it sets a hard 
(and small) limit on the number of things that a process which wants to use 
this facility can be doing.

On the other hand, if you hard-code another arbitrary limit like this into the 
stdlib subprocess module, it will just be another great reason why Twisted's 
spawnProcess is the best and everyone should use it instead, so be my guest ;-).

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3145 (With Contents)

2012-12-08 Thread Glyph

On Dec 8, 2012, at 8:37 PM, Gregory P. Smith g...@krypto.org wrote:

 Is twisted's spawnProcess thread safe and async signal safe by using 
 restricted C code for everything between the fork() and exec()?  I'm not 
 familiar enough with the twisted codebase to find things easily in it but I'm 
 not seeing such an extension module within twisted and the code in 
 http://twistedmatrix.com/trac/browser/trunk/twisted/internet/process.py 
 certainly is not safe.  Just sayin'. :)


It's on the agenda: http://twistedmatrix.com/trac/ticket/5710.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Socket timeout and completion based sockets

2012-11-28 Thread Glyph

On Nov 28, 2012, at 12:04 PM, Guido van Rossum gu...@python.org wrote:

 Anyway, as for concrete requirements:  The issue I have always seen with 
 various asynchronous libraries is their lack of composability.  Everyone 
 writes their own application loop and event queue.  Merely having a standard 
 spec and reference implementation of an application main loop object, and 
 main event queue object, in the spirit of WSGI, would possibly remedy this.  
 You could then hopefully assemble various different libraries in the same 
 application, including greenlet(*) based ones.
 
 Hm. I agree with the first part of this -- and indeed I am planning to
 make it so that tulip's event loop can easily be replaced by another
 one. I'm less sure about the yield-from-based scheduler, that's the
 kind of thing for which it doesn't really make sense to have multiple
 implementations. If greenlets can work with the standard event loop
 interface, good for them. (Either by providing a conforming
 implementation that also supports greenlets, or by just using the
 standard implementation.)

I'm really happy that you are building this in as a core feature of Tulip.  
It's really important.

Very early on, Twisted attempted to avoid this lack of composability by 
explicitly delegating to other application loops; it's one of my favorite 
features of Twisted.  Granted, no two loops we have attempted to use have 
themselves been composable, but there's not much we can do about that :).  
Still, code written on top of Twisted can always be plugged in to any other 
loop by simply using the appropriate reactor.  (There's also a plug-in 
interface for the reactor and a plug-in discovery mechanism so that third 
parties can easily provide their own reactors if they have an unusual main loop 
that isn't supported by twisted itself.)

I would also like to bring up https://github.com/lvh/async-pep again.  If 
anyone really wants to dig in and enumerate the use-cases for the _lower-level_ 
event delivery portions of Tulip, something that would be compatible with 
Twisted and Tornado and so on, that PEP already has a good skeleton and could 
use some pull requests.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Socket timeout and completion based sockets

2012-11-26 Thread Glyph
On Nov 26, 2012, at 11:05 AM, Richard Oudkerk shibt...@gmail.com wrote:

 Using CancelIo()/CancelIoEx() to abort an operation started with WSARecv() 
 does not seem to cause a problem

(emphasis mine)

Little command-line experiments are not the right way to verify the behavior of 
high-performance I/O APIs.  You need to do careful reading of the 
documentation, significant testing under load and experiments on a huge variety 
of platforms.  Windows runs in _lots_ of horrible, horrible places.

I think that the safest option would really be to better document the somewhat 
mangled state that a timeout may leave a socket in.  I don't believe the 
feature was intended for pipelined protocols; if you receive a timeout by using 
the stdlib timeout functionality you have generally burned the socket.

And, generally, things that care enough about performance or scalability enough 
to use IOCP operations should never use timeout-sockets anyway; it may do a 
select() internally (and on Windows, where poll() is never available, it _will_ 
do a select() internally), which limits the number of file descriptors you 
might even have in your process before you start encountering spurious errors.  
The use case it supports is when you have a little tool that just needs to 
fetch a URL or something really simple, but wants to be able to get on with 
things if that doesn't work or takes too long.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Memoryviews should expose the underlying memory address

2012-09-20 Thread Glyph
Le Sep 20, 2012 à 11:35 AM, David Beazley d...@dabeaz.com a écrit :

 Well, if it's supposed to do that, it certainly doesn't work for me in 3.3.  
 I get a type error about it wanting a ctypes pointer object.Even if this 
 worked, it still doesn't address the need to get the pointer value possibly 
 for some other purpose such as handling it off to a bunch of code generated 
 via LLVM. 

It seems like there's no reason to need to get the pointer value out as a 
Python integer.  If you are trying to get a pointer from a memoryview into some 
C code, or into some LLVM generated code, you still need to do the Python int 
object → C integer-of-some-kind → C pointer type conversion.  Better to just go 
straight from Python memoryview object → C pointer in one supported API call.  
Isn't this what the y* w* s* format codes are for?

Every time I have something that's a big number and I need to turn it into a 
pointer, I have to stare at the table in 
http://en.wikipedia.org/wiki/64_bit#64-bit_data_models for like 30 seconds.  
I'd rather have some Python API do the staring for me.  David, I realize that 
table is probably permanently visible in the heads-up display that your 
cybernetic implants afford you, but some of us need to make our way through C 
code with humbler faculties ;-).

-g

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A new JIT compiler for a faster CPython?

2012-07-17 Thread Glyph

On Jul 17, 2012, at 11:38 AM, Victor Stinner victor.stin...@gmail.com wrote:

 IMO PyPy is complex and hard to maintain. PyPy has a design completly
 different than CPython and is much faster and has a better memory
 footprint. I don't expect to be as fast as PyPy, just faster than
 CPython.

I think this criticism is misguided.

Let's grant for the moment that you're right, and PyPy is complex and hard to 
maintain.  If a high-level Python parser and JIT compiler written in Python 
came out as complex and unmaintainable, why do you believe that they'll be easy 
to write in C?

You are correct that it has a different architecture than CPython: it has a 
different architecture because CPython's architecture is limiting because of 
its simplicity and makes it difficult to do things like write JIT compilers.  
The output of the Unladen Swallow project was illuminating in that regard.  
(Please note I said output and not failure, the Unladen Swallow folks did 
the community a great service and produced many useful artifacts, even if they 
didn't meet their original goal.)

Polluting the straightforward, portable architecture of CPython with 
significant machine-specific optimizations to bolt on extra features that are 
already being worked on elsewhere seems like a waste of effort to me.  You 
could, instead, go work on documenting PyPy's architecture so it seems less 
arcane to newcomers.  Some of the things in there which look like hideous black 
magic are actually fairly straightforward when explained, as I have learned by 
being lucky enough to receive explanations in person from Maciej, Benjamin and 
Alex at various conferences.

I mean, don't get me wrong, if this worked out, I'd love a faster CPython, I do 
still use use many tools which don't support PyPy yet, so I can see the appeal 
of greater runtime compatibility with CPython than CPyExt offers.  I just think 
that it will end up being a big expenditure of effort for relatively little 
return.

If you disagree, you should feel no need to convince me; just go do it and 
prove me wrong, which I will be quite happy to be.  I would just like to think 
about whether this is the best use of your energy first.

But definitely listen to Maciej's suggestion about concentrating efforts with 
other people engaged in similar efforts, regardless :).  As your original 
message shows, there has already been enough duplication of effort in this area.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] TZ-aware local time

2012-06-05 Thread Glyph

Le Jun 5, 2012 à 6:16 PM, Nick Coghlan a écrit :

 Personally, I'd like to see the datetime module make an explicit
 assumption that all naive datetime objects are considered to be UTC,
 with the interactions between naive and aware objects updated
 accordingly

I would absolutely love it if this were true.  In fact, I would go a step 
further and say that the whole concept of a naive datetime is simply a bug.  
We don't have a naive unicode, for example, where it's text in some encoding 
but you decline to decide which one when you decode it, leaving that to the 
caller.

When we addresed this problem for ourselves at Divmod some time ago, naive=UTC 
is exactly what we did:

http://bazaar.launchpad.net/~divmod-dev/divmod.org/trunk/files/head:/Epsilon/epsilon/extime.py___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] docs.python.org pointing to Python 3 by default?

2012-05-19 Thread Glyph
On May 18, 2012, at 2:24 PM, Barry Warsaw wrote:

 At what point should we cut over docs.python.org to point to the Python 3
 documentation by default?  Wouldn't this be an easy bit to flip in order to
 promote Python 3 more better?

I would like to suggest a less all-or-nothing approach.  Just redirecting to 
Python 3 docs is going to create a lot of support headaches for people trying 
to help others learn Python.

Right now, e.g. http://docs.python.org/tutorial/index.html directly renders a 
page.  I suggest that this be changed to a redirect to 
http://docs.python.org/release/2.7/tutorial/index.html.  The fact that people 
can bookmark the default version of a document is kind of a bug.

The front page, http://docs.python.org/ could then be changed into a are you 
looking for documentation for Python 2 or Python 3? page, with nice big click 
targets for each (an initial suggestion: half the page each, split down the 
middle, but the web design isn't really the important thing for me).

If you want to promote python 3 then putting most recent version links (for 
example, see 
http://twistedmatrix.com/documents/10.2.0/api/twisted.internet.defer.inlineCallbacks.html)
 across the top of all the old versions would be pretty visible.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Issue #11750: The Windows API functions scattered in the _subprocess and

2012-04-19 Thread Glyph
On Apr 19, 2012, at 11:51 AM, Guido van Rossum wrote:

 In all those cases I think there should be some core contributors who
 know the real identity of the contributor. These must also know the
 reason for the anonymity and agree that it's important to maintain it.
 It must also be known to the community at large that the contributor
 is using a pseudonym. If the contributor is not comfortable revealing
 their identity to any core contributors, I don't think there is enough
 of a trust relationship to build on for a successful career as a
 contributor to Python.

I do think that python-dev should be clear that by real identity you mean 
legal identity.

There are plenty of cases where the name a person is known by in more real 
situations is not in fact their legal name.  There are also cases where legal 
names are different in different jurisdictions; especially people with CJK 
names may have different orthographies of the same name in different 
jurisdictions or even completely different names in different places, if they 
have immigrated to a different country.

So there should be a legal name on file somewhere for copyright provenance 
purposes, but this should not need to be the same name that is present in 
commit logs, as long as there's a mapping recorded that can be made available 
to any interested lawyer.

(Hopefully this is not a practical issue, but this is one of my pet peeves - 
for obvious reasons.)

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Require loaders set __package__ and __loader__

2012-04-15 Thread Glyph

On Apr 14, 2012, at 3:32 PM, Guido van Rossum wrote:

 Funny, I was just thinking about having a simple standard API that
 will let you open files (and list directories) relative to a given
 module or package regardless of how the thing is loaded.


Twisted has such a thing, mostly written by me, called twisted.python.modules.

Sorry if I'm repeating myself here, I know I've brought it up on this list 
before, but it seems germane to this thread.  I'd be interested in getting 
feedback from the import-wizards participating in this thread in case it is 
doing anything bad (in particular I'd like to make sure it will keep working in 
future versions of Python), but I think it may provide quite a good template 
for a standard API.

The code's here: 
http://twistedmatrix.com/trac/browser/trunk/twisted/python/modules.py

The API is fairly simple.

 from twisted.python.modules import getModule
 e = getModule(email) # get an abstract module object (un-loaded)
 e
PythonModule'email'
 walker = e.walkModules() # walk the module hierarchy
 walker.next()
PythonModule'email'
 walker.next()
PythonModule'email._parseaddr'
 walker.next() # et cetera
PythonModule'email.base64mime'
 charset = e[charset] # get the 'charset' child module of the 'e' package
 charset.filePath
FilePath('.../lib/python2.7/email/charset.py')
 charset.filePath.parent().children() # list the directory containing 
 charset.py

Worth pointing out is that although in this example it's a FilePath, it could 
also be a ZipPath if you imported stuff from a zipfile.  We have an adapter 
that inspects path_importer_cache and produces appropriately-shaped 
filesystem-like objects depending on where your module was imported from.  
Thank you to authors of PEP 302; that was my religion while writing this code.

You can also, of course, ask to load something once you've identified it with 
the traversal API:

 charset.load()
module 'email.charset' from '.../lib/python2.7/email/charset.pyc'

You can also ask questions like this, which are very useful when debugging 
setup problems:

 ifaces = getModule(twisted.internet.interfaces)
 ifaces.pathEntry
PathEntryFilePath('/Domicile/glyph/Projects/Twisted/trunk')
 list(ifaces.pathEntry.iterModules())
[PythonModule'setup', PythonModule'twisted']

This asks what sys.path entry is responsible twisted.internet.interfaces, and 
then what other modules could be loaded from there.  Just 'setup' and 'twisted' 
indicates that this is a development install (not surprising for one of my 
computers), since site-packages would be much more crowded.

The idiom for saying there's a file installed near this module, and I'd like 
to grab it as a string, is pretty straightforward:

from twisted.python.modules import getModule
mod = getModule(__name__).filePath.sibling(my-file).open().read()

And hopefully it's obvious from this idiom how one might get the pathname, or a 
stream rather than the bytes.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Require loaders set __package__ and __loader__

2012-04-15 Thread Glyph

On Apr 15, 2012, at 6:38 PM, Barry Warsaw wrote:

 On Apr 15, 2012, at 02:12 PM, Glyph wrote:
 
 Twisted has such a thing, mostly written by me, called
 twisted.python.modules.
 
 Sorry if I'm repeating myself here, I know I've brought it up on this list
 before, but it seems germane to this thread.  I'd be interested in getting
 feedback from the import-wizards participating in this thread in case it is
 doing anything bad (in particular I'd like to make sure it will keep working
 in future versions of Python), but I think it may provide quite a good
 template for a standard API.
 
 The code's here: 
 http://twistedmatrix.com/trac/browser/trunk/twisted/python/modules.py
 
 The API is fairly simple.
 
 from twisted.python.modules import getModule
 e = getModule(email) # get an abstract module object (un-loaded)
 
 Got a PEP 8 friendly version? :)

No, but I'd be happy to do the translation manually if people actually prefer 
the shape of this API!

I am just pointing it out as a source of inspiration for whatever comes next, 
which I assume will be based on pkg_resources.

-glyph
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 418 is too divisive and confusing and should be postponed

2012-04-07 Thread Glyph Lefkowitz
On Apr 7, 2012, at 3:40 AM, Steven D'Aprano wrote:

 In any case, NTP is not the only thing that adjusts the clock, e.g. the 
 operating system will adjust the time for daylight savings.

Daylight savings time is not a clock adjustment, at least not in the sense this 
thread has mostly been talking about the word clock.  It doesn't affect the 
seconds from epoch measurement, it affects the way in which the clock is 
formatted to the user.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] this is why we shouldn't call it a monotonic clock (was: PEP 418 is too divisive and confusing and should be postponed)

2012-04-06 Thread Glyph Lefkowitz
 the latter, as it will relieve us of the need
 to repeatedly explain to newcomers: That word doesn't mean what you
 think it means..

I don't think anything can (or should) relieve that need.

I am somewhat sympathetic to your preference for steady as a better overall 
term.  It does express the actually-desired property of the clock, even if that 
property isn't always present; steadiness is not a property that one can be 
tempted to synthesize, so it removes the temptation to cloud the discussion 
with that.  Ultimately I don't prefer it, because I think its provenance is 
less venerable than monotonic, just because I have a bit more respect for the 
POSIX committee than the C++ one :-).

However, whatever choice we make in terminology, the documentation for this API 
must stress what it actually does, and what guarantee it actually provides.  In 
that sense, my preferred term for this would be the 
time.zfnrg_lfj_lpqq(ZFNRG_TIME | ZFNRG_SEMI_STEADY | ZFNRG_SEE_DOCUMENTATION).

 The main reason to use the word monotonic clock to refer to the
 second concept is that POSIX does so, but since Mac OS X, Solaris,
 Windows, and C++ have all avoided following POSIX's mistake, I think
 Python should too.

Do you just mean that the APIs don't have monotonic in the name?  They all 
use different words, which strikes me as more of a failure than a success, in 
the realm of making mistakes about communicating things :).

 Regards,
 
 Zooko
 
 ¹ http://mathworld.wolfram.com/MonotonicSequence.html
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/glyph%40twistedmatrix.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Use QueryPerformanceCounter() for time.monotonic() and/or time.highres()?

2012-04-02 Thread Glyph Lefkowitz

On Apr 2, 2012, at 10:39 AM, Kristján Valur Jónsson wrote:

 no steps is something unquantifiable.  All time has steps in it.

No steps means something very specific when referring to time APIs.  As I 
recently explained here: 
http://article.gmane.org/gmane.comp.python.devel/131487/.

-glyph


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Use QueryPerformanceCounter() for time.monotonic() and/or time.highres()?

2012-03-30 Thread Glyph

On Mar 30, 2012, at 8:51 PM, Victor Stinner wrote:

 time.highres() (QPC) rate is only steady during a short duration

QPC is not even necessarily steady for a short duration, due to BIOS bugs, 
unless the code running your timer is bound to a single CPU core.  
http://msdn.microsoft.com/en-us/library/ms644904 mentions 
SetThreadAffinityMask for this reason, despite the fact that it is usually 
steady for longer than that.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Use QueryPerformanceCounter() for time.monotonic() and/or time.highres()?

2012-03-30 Thread Glyph
On Mar 30, 2012, at 9:32 PM, Guido van Rossum wrote:

  - no steps
 
 You mean not adjusted by NTP? Except CLOCK_MONOTONIC on Linux, no
 monotonic clock is adjusted by NTP. On Linux, there is
 CLOCK_MONOTONIC_RAW, but it is only available on recent Linux kernel
 (2.6.28).
 
 Do you think that it is important to be able to refuse a monotonic
 clock adjusted by NTP? What would be the use case of such truly steady
 clock?
 
 That depends on what NTP can do to the clock. If NTP makes the clock
 tick *slightly* faster or slower in order to gradually adjust the wall
 clock, that's fine. If NTP can make it jump wildly forward or even
 backward, it's no better than time.time(), and we know why (for some
 purposes) we don't want that.

no steps means something very specific.  It does not mean not adjusted by 
NTP.

In NTP, changing the clock frequency to be slightly faster or slower is called 
slewing (which is done with adjtime()).  Jumping by a large amount in a 
single discrete step is called stepping (which is done with settimeofday()).  
This is sort-of explained by http://doc.ntp.org/4.1.2/ntpd.htm.

I think I'm agreeing with Guido here when I say that, personally, my 
understanding is that slewing is generally desirable (i.e. we should use 
CLOCK_MONOTONIC, not CLOCK_MONOTONIC_RAW) if one wishes to measure real time 
(and not a time-like object like CPU cycles).  This is because the clock on the 
other end of the NTP connection from you is probably better at keeping time: 
hopefully that thirty five thousand dollars of Cesium timekeeping goodness is 
doing something better than your PC's $3 quartz crystal, after all.

So, slew tends to correct for minor defects in your local timekeeping 
mechanism, and will compensate for its tendency to go too fast or too slow.  By 
contrast, stepping only happens if your local clock is just set incorrectly and 
the re-sync delta has more to do with administrative error or failed batteries 
than differences in timekeeping accuracy.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Use QueryPerformanceCounter() for time.monotonic() and/or time.highres()?

2012-03-30 Thread Glyph

On Mar 30, 2012, at 10:17 PM, Victor Stinner wrote:

 (...)
  By contrast, stepping only happens if your local clock is just set
 incorrectly and the re-sync delta has more to do with administrative error
 or failed batteries than differences in timekeeping accuracy.
 
 Are you talking about CLOCK_REALTIME or CLOCK_MONOTONIC?

My understanding is:

CLOCK_REALTIME is both stepped and slewed.

CLOCK_MONOTONIC is slewed, but not stepped.

CLOCK_MONOTONIC_RAW is neither slewed nor stepped.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Use QueryPerformanceCounter() for time.monotonic() and/or time.highres()?

2012-03-30 Thread Glyph

On Mar 30, 2012, at 10:25 PM, Glyph wrote:

 
 On Mar 30, 2012, at 10:17 PM, Victor Stinner wrote:
 
 (...)
 By contrast, stepping only happens if your local clock is just set
 incorrectly and the re-sync delta has more to do with administrative error
 or failed batteries than differences in timekeeping accuracy.
 
 Are you talking about CLOCK_REALTIME or CLOCK_MONOTONIC?
 
 My understanding is:
 
 CLOCK_REALTIME is both stepped and slewed.
 
 CLOCK_MONOTONIC is slewed, but not stepped.
 
 CLOCK_MONOTONIC_RAW is neither slewed nor stepped.

Sorry, I realize I should cite my source.

This mailing list post talks about all three together: 
http://www.spinics.net/lists/linux-man/msg00973.html

Although the documentation one can find by searching around the web is really 
bad.  It looks like many of these time features were introduced, to Linux at 
least, with no documentation.

-glyph
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 418: Add monotonic clock

2012-03-27 Thread Glyph
On Mar 26, 2012, at 10:26 PM, Zooko Wilcox-O'Hearn wrote:

 Note that the C++ standard deprecated monotonic_clock once they
 realized that there is absolutely no point in having a clock that
 jumps forward but not back, and that none of the operating systems
 implement such a thing -- instead they all implement a clock which
 doesn't jump in either direction.

This is why I don't like the C++ terminology, because it seems to me that the 
C++ standard makes incorrect assertions about platform behavior, and apparently 
they standardized it without actually checking on platform capabilities.

The clock does jump forward when the system suspends.  At least some existing 
implementations of steady_clock in C++ already have this problem, and I think 
they all might.  I don't think they can fully fix it without kernel changes, 
either.  On linux, see discussion of a possible CLOCK_BOOTTIME in the future.  
The only current way I know of to figure out how long the system has been 
asleep is to look at the wall clock and compare, and we've already gone over 
the problems with relying on the wall clock.

Plus, libstdc++ gives you no portable way to get informed about system power 
management events, so you can't fix it even if you know about this problem, 
natch.

Time with respect to power management state changes is something that the PEP 
should address fully, for each platform.

On the other hand, hopefully you aren't controlling your python-based CNC laser 
welder from a laptop that you are closing the lid on while the beam is in 
operation.  Not that the PEP shouldn't address it, but maybe it should just 
address it to say you're on your own and refer to a few platform-specific 
resources for correcting this type of discrepancy.  
(https://developer.apple.com/library/mac/#qa/qa1340/_index.html, 
http://msdn.microsoft.com/en-us/library/aa394362.aspx, 
http://upower.freedesktop.org/docs/UPower.html#UPower::Sleeping).

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 418: Add monotonic clock

2012-03-27 Thread Glyph
On Mar 27, 2012, at 3:17 AM, Glyph wrote:

 I don't think they can fully fix it without kernel changes

I got really curious about this and went and did some research.  With some 
really platform-specific hackery on every platform, you can mostly figure it 
out; completely on OS X and Windows, although (as far as I can tell) only 
partially on Linux and FreeBSD.

I'm not sure if it's possible to make use of these facilities without a 
Twisted-style event-loop though.  If anybody's interested, I recorded the 
results of my research in a comment on the Twisted ticket for this: 
http://twistedmatrix.com/trac/ticket/2424#comment:26.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Drop the new time.wallclock() function?

2012-03-26 Thread Glyph

On Mar 26, 2012, at 6:31 PM, Zooko Wilcox-O'Hearn wrote:

 On Fri, Mar 23, 2012 at 11:27 AM, Victor Stinner
 victor.stin...@gmail.com wrote:
 
 time.steady(strict=False) is what you need to implement timeout.
 
 No, that doesn't fit my requirements, which are about event
 scheduling, profiling, and timeouts. See below for more about my
 requirements.
 
 I didn't say this explicitly enough in my previous post:
 
 Some use cases (timeouts, event scheduling, profiling, sensing)
 require a steady clock. Others (calendaring, communicating times to
 users, generating times for comparison to remote hosts) require a wall
 clock.
 
 Now here's the kicker: each use case incur significant risks if it
 uses the wrong kind of clock.
 
 If you're implementing event scheduling or sensing and control, and
 you accidentally get a wall clock when you thought you had a steady
 clock, then your program may go seriously wrong -- events may fire in
 the wrong order, measurements of your sensors may be wildly incorrect.
 This can lead to serious accidents. On the other hand, if you're
 implementing calendaring or display of real local time of day to a
 user, and you are using a steady clock for some reason, then you risk
 displaying incorrect results to the user.
 
 So using one kind of clock and then falling back to the other kind
 is a choice that should be rare, explicit, and discouraged. The
 provision of such a function in the standard library is an attractive
 nuisance -- a thing that people naturally think that they want when
 they haven't though about it very carefully, but that is actually
 dangerous.
 
 If someone has a use case which fits the steady or else fall back to
 wall clock pattern, I would like to learn about it.

I feel that this should be emphasized.  Zooko knows what he's talking about 
here.  Listen to him :).  (Antoine has the right idea.  I think it's well past 
time for a PEP on this feature.)

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Playing with a new theme for the docs

2012-03-22 Thread Glyph Lefkowitz
On Mar 21, 2012, at 6:28 PM, Greg Ewing wrote:

 Ned Batchelder wrote:
 Any of the tweaks people are suggesting could be applied individually using 
 this technique.  We could just as easily choose to make the site 
 left-justified, and let the full-justification fans use custom stylesheets 
 to get it.
 
 Is it really necessary for the site to specify the justification
 at all? Why not leave it to the browser and whatever customisation
 the user chooses to make?

It's design.  It's complicated.

Maybe yes, if you look at research related to default usage patterns, and 
saccade distance, reading speed and retention latency.

Maybe no, if you look at research related to fixation/focus time, eye strain, 
and non-linear access patterns.

Maybe maybe, if you look at the subjective aesthetic of the page according to 
various criteria, like does it look like a newspaper and do I have to resize 
my browser every time I visit a new site to get a decent width for reading.

As has been said previously in this thread several times, it's best to leave 
this up to a design czar who will at least make some decisions who will make 
some people happy.  I'm fairly certain it's not possible to create a design 
that's optimal for all readers in all cases.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 13524: subprocess on Windows

2012-03-22 Thread Glyph Lefkowitz
On Mar 21, 2012, at 4:38 PM, Brad Allen wrote:

 I tripped over this one trying to make one of our Python at work
 Windows compatible. We had no idea that a magic 'SystemRoot'
 environment variable would be required, and it was causing issues for
 pyzmq.
 
 It might be nice to reflect the findings of this email thread on the
 subprocess documentation page:
 
 http://docs.python.org/library/subprocess.html
 
 Currently the docs mention this:
 
 Note If specified, env must provide any variables required for the
 program to execute. On Windows, in order to run a side-by-side
 assembly the specified env must include a valid SystemRoot.
 
 How about rewording that to:
 
 Note If specified, env must provide any variables required for the
 program to execute. On Windows, a valid SystemRoot environment
 variable is required for some Python libraries such as the 'random'
 module. Also, in order to run a side-by-side assembly the specified
 env must include a valid SystemRoot.

Also, in order to execute in any installation environment where libraries are 
found in non-default locations, you will need to set LD_LIBRARY_PATH.  Oh, and 
you will also need to set $PATH on UNIX so that libraries can find their helper 
programs and %PATH% on Windows so that any compiled dynamically-loadable 
modules and/or DLLs can be loaded.  And by the way you will also need to relay 
DYLD_LIBRARY_PATH if you did a UNIX-style build on OS X, not LD_LIBRARY_PATH.  
Don't forget that you probably also need PYTHONPATH to make sure any subprocess 
environments can import the same modules as their parent.  Not to mention 
SSH_AUTH_SOCK if your application requires access to _remote_ process spawning, 
rather than just local.  Oh and DISPLAY in case your subprocesses need GUI 
support from an X11 program (which sometimes you need just to initialize 
certain libraries which don't actually do anything with a GUI).  Oh and 
__CF_USER_TEXT_ENCODING is important sometimes too, don't forget that.  And 
 if your subprocess is in Perl or Ruby or Java you may need a couple dozen 
other variables which your deployment environment has set for you too.  Did I 
mention CFLAGS or LC_ALL yet?  Let me tell you a story about this one HP/UX 
machine...

Ahem.

Bottom line: it seems like screwing with the process spawning environment to 
make it minimal is a good idea for simplicity, for security, and for 
modularity.  But take it from me, it isn't.  I guarantee you that you don't 
actually know what is in your operating system's environment, and initializing 
it is a complicated many-step dance which some vendor or sysadmin or product 
integrator figured out how to do much better than your hapless Python program 
can.

%SystemRoot% is just the tip of a very big, very nasty iceberg.  Better not to 
keep refining why exactly it's required, or someone will eventually be adding a 
new variable (starting with %APPDATA% and %HOMEPATH%) that can magically cause 
your subprocess not to spawn properly to this page every six months for 
eternity.  If you're spawning processes as a regular user, you should just take 
the environment you're given, perhaps with a few specific light additions whose 
meaning you understand.  If you're spawning a process as an administrator or 
root, you should probably initialize the environment for the user you want to 
spawn that process as using an OS-specific mechanism like login(1).  (Sorry 
that I don't know the Windows equivalent.)

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Issue #10278: Add an optional strict argument to time.steady(), False by default

2012-03-20 Thread Glyph

On Mar 20, 2012, at 3:33 AM, Matt Joiner wrote:

 I believe we should make a monotonic_time method that assures monotonicity 
 and be done with it. Forward steadiness can not be guaranteed. No parameters.
 

I think this discussion has veered off a bit into the overly-theoretical.  
Python cannot really guarantee anything here; alternately, it guarantees 
everything, since if you don't like what Python gives you you can always get 
your money back :).  It's the OS's job to guarantee things.  We can all agree 
that a monotonic clock of some sort is useful.

However, maybe my application wants CLOCK_MONOTONIC and maybe it wants 
CLOCK_MONOTONIC_RAW.  Sometimes I want GetTickCount64 and sometimes I want 
QueryUnbiasedInterruptTime.  While these distinctions are probably useless to 
most applications, they may be of interest to some, and Python really shouldn't 
make it unduly difficult to get at them.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sharing sockets among processes on windows

2012-03-14 Thread Glyph Lefkowitz

On Mar 13, 2012, at 5:27 PM, Kristján Valur Jónsson wrote:

 Hi,
 I´m interested in contributing a patch to duplicate sockets between processes 
 on windows.
 Tha api to do this is WSADuplicateSocket/WSASocket(), as already used by 
 dup() in the _socketmodule.c
 Here´s what I have:

Just in case anyone is interested, we also have a ticket for this in Twisted: 
http://twistedmatrix.com/trac/ticket/4389. It would be great to share code as 
much as possible.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-02-01 Thread Glyph Lefkowitz
On Feb 1, 2012, at 12:46 PM, Guido van Rossum wrote:

 I understand that you're hesitant to just dump your current mess, and
 you want to clean it up before you show it to us. That's fine. (...) And 
 remember, it doesn't need to be
 perfect (in fact perfectionism is probably a bad idea here).

Just as a general point of advice to open source contributors, I'd suggest 
erring on the side of the latter rather than the former suggestion here: dump 
your current mess, along with the relevant caveats (it's a mess, much of it is 
irrelevant) so that other developers can help you clean it up, rather than 
putting the entire burden of the cleanup on yourself.  Experience has taught me 
that most people who hold back work because it needs cleanup eventually run out 
of steam and their work never gets integrated and maintained.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Packaging and setuptools compatibility

2012-01-24 Thread Glyph Lefkowitz
On Jan 24, 2012, at 12:54 PM, Alexis Métaireau wrote:

 I'm wondering if we should support that (a way to have plugins) in the new 
 packaging thing, or not. If not, this mean we should come with another 
 solution to support this outside of packaging (may be in distribute). If yes, 
 then we should design it, and probably make it a sub-part of packaging.

First, my interest: Twisted has its own plugin system.  I would like this to 
continue to work in the future.

I do not believe that packaging should support plugins directly.  Run-time 
metadata is not the packaging system's job.  However, the packaging system does 
need to provide some guarantees about how to install and update data at 
installation (and post-installation time) so that databases of plugin metadata 
may be kept up to date.  Basically, packaging's job is constructing explicitly 
declared parallels between your development environment and your deployment 
environment.

Some such databases are outside of Python entirely (for example, you might 
think of /etc/init.d as such a database), so even if you don't care about the 
future of Twisted's weirdo plugin system, it would be nice for this to be 
supported.

In other words, packaging should have a meta-plugin system: a way for a plugin 
system to register itself and provide an API for things to install their 
metadata, and a way to query the packaging module about the way that a Python 
package is installed so that it can put things near to it in an appropriate 
way.  (Keep in mind that near to it may mean in a filesystem directory, or a 
zip file, or stuffed inside a bundle or executable.)

In my design of Twisted's plugin system, we used PEP 302 as this sort of 
meta-standard, and (modulo certain bugs in easy_install and pip, most of which 
are apparently getting fixed in pip pretty soon) it worked out reasonably well. 
 The big missing pieces are post-install and post-uninstall hooks.  If we had 
those, translating to native packages for Twisted (and for things that use 
it) could be made totally automatic.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Coroutines and PEP 380

2012-01-19 Thread Glyph

On Jan 19, 2012, at 4:41 PM, Greg wrote:

 Glyph wrote:
 [Guido] mentions the point that coroutines that can implicitly switch out 
 from under you have the same non-deterministic property as threads: you 
 don't know where you're going to need a lock or lock-like construct to 
 update any variables, so you need to think about concurrency more deeply 
 than if you could explicitly always see a 'yield'.
 
 I'm not convinced that being able to see 'yield's will help
 all that much.

Well, apparently we disagree, and I work on such a system all day, every day 
:-).  It was nice to see that Matt Joiner also agreed for very similar reasons, 
and at least I know I'm not crazy.

 In any system that makes substantial use of
 generator-based coroutines, you're going to see 'yield from's
 all over the place, from the lowest to the highest levels.
 But that doesn't mean you need a correspondingly large
 number of locks. You can't look at a 'yield' and conclude
 that you need a lock there or tell what needs to be locked.

Yes, but you can look at a 'yield' and conclude that you might need a lock, and 
that you have to think about it.

Further exploration of my own feelings on the subject grew a bit beyond a good 
length for a reply here, so if you're interested in my thoughts you can have a 
look at my blog: 
http://glyph.twistedmatrix.com/2012/01/concurrency-spectrum-from-callbacks-to.html.

 There's no substitute for deep thought where any kind of theading is 
 involved, IMO.

Sometimes there's no alternative, but wherever I can, I avoid thinking, 
especially hard thinking.  This maxim has served me very well throughout my 
programming career ;-).

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Coroutines and PEP 380

2012-01-18 Thread Glyph
On Jan 18, 2012, at 4:23 AM, Mark Shannon wrote:

 Glyph wrote:
 On Jan 17, 2012, at 5:03 PM, Mark Shannon wrote:
 Lets start controversially: I don't like PEP 380, I think it's a kludge.
 Too late; it's already accepted.  There's not much point in making 
 controversial statements about it now.
 
 Why is it too late?

Because discussion happens before the PEP is accepted.  See the description of 
the workflow in http://www.python.org/dev/peps/pep-0001/.  The time to object 
to PEP 380 was when those threads were going on.

 Presenting this as a fait accompli does not make it any better.

But it is[1] a fait accompli, whether you like it or not; I'm first and 
foremost informing you of the truth, not trying to make you feel better (or 
worse).  Secondly, I am trying to forestall a long and ultimately pointless 
conversation :).

 The PEP mailing list is closed to most people,

The PEP mailing list is just where you submit your PEPs, and where the PEP 
editors do their work.  I'm not on it, but to my understanding of the process, 
there's not really any debate there.

 so what forum for debate is there?

python-ideas, and then this mailing list, in that order.  Regarding PEP 380 
specifically, there's been quite a bit.  See for example 
http://thread.gmane.org/gmane.comp.python.devel/102161/focus=102164.  Keep in 
mind that the purpose of debate in this context is to inform Guido's opinion.  
There's no voting involved, although he will occasionally delegate decisions 
about particular PEPs to people knowledgeable in a relevant area.

 I think this discussion would be more suitable for python-ideas though [...]
 Already been discussed:
 http://mail.python.org/pipermail/python-ideas/2011-October/012571.html

If you're following the PEP process, then the next step would be for you 
(having built some support) to author a new PEP, or to resurrect the deferred 
Stackless PEP with some new rationale - personally I'd recommend the latter.

My brief skimming of the linked thread doesn't indicate you have a lot of 
strong support though, just some people who would be somewhat interested.  So I 
still think it bears more discussion there, especially on the motivation / 
justification side of things.

 All of the objections to coroutines (as I propose) also apply to PEP 380.

You might want to see the video of Guido's Fireside Chat last year 
http://pycon.tv/#/video/100.  Skip to a little before 15:00.  He mentions the 
point that coroutines that can implicitly switch out from under you have the 
same non-deterministic property as threads: you don't know where you're going 
to need a lock or lock-like construct to update any variables, so you need to 
think about concurrency more deeply than if you could explicitly always see a 
'yield'.  I have more than one painful event in my past (as he refers to it) 
indicating that microthreads have the same problem as real threads :).

(And yes, they're microthreads, even if you don't have an elaborate scheduling 
construct.  If you can switch to another stack by making a function call, then 
you are effectively context switching, and it can become arbitrarily complex.  
Any coroutine in a system may introduce an arbitrarily complex microthread 
scheduler just by calling a function that yields to it.)

-glyph

([1]: Well actually it isn't, note the dashed line from Accepted to 
Rejected in the workflow diagram.  But you have to have a really darn good 
reason, and championing the rejection of a pep that Guido has explicitly 
accepted and has liked from pretty much the beginning is going to be very, very 
hard.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Coroutines and PEP 380

2012-01-17 Thread Glyph
On Jan 17, 2012, at 5:03 PM, Mark Shannon wrote:

 Lets start controversially: I don't like PEP 380, I think it's a kludge.

Too late; it's already accepted.  There's not much point in making 
controversial statements about it now.

 I think that CPython should have proper coroutines, rather than add more bits 
 and pieces to generators in an attempt to make them more like coroutines.


By proper coroutines, you mean implicit coroutines (cooperative threads) 
rather than explicit coroutines (cooperative generators).  Python has been 
going in the explicit direction on this question for a long time.  (And, in 
my opinion, this is the right direction to go, but that's not really relevant 
here.)

I think this discussion would be more suitable for python-ideas though, since 
you have a long row to hoe here.  There's already a PEP - 
http://www.python.org/dev/peps/pep-0219/ - apparently deferred and not 
rejected, which you may want to revisit.

There are several libraries which can give you cooperative threading already; I 
assume you're already aware of greenlet and stackless, but I didn't see what 
advantages your proposed implementation provides over those.  I would guess 
that one of the first things you should address on python-ideas is why adopting 
your implementation would be a better idea than just bundling one of those with 
the standard library :).

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] devguide: Backporting is obsolete. Add details that I had to learn.

2012-01-10 Thread Glyph

On Jan 10, 2012, at 7:57 AM, Antoine Pitrou wrote:

 On Tue, 10 Jan 2012 08:49:04 +
 Rob Cliffe rob.cli...@btinternet.com wrote:
 But minor version and major version are readily understandable to 
 the general reader, e.g. me, whereas feature release and release 
 series I find are not.  Couldn't the first two terms be defined once 
 and then used throughout?
 
 To me minor is a bugfix release, e.g. 2.7.2, and major is a feature
 release, e.g. 3.3.  I have a hard time considering 3.2 or 3.3 minor.

Whatever your personal feelings, there is a precedent established in the API:

 sys.version_info.major
2
 sys.version_info.minor
7
 sys.version_info.micro
1

This strikes me as the most authoritative definition of the terms, in the 
context of Python.  (Although the fact that this precedent is widely 
established elsewhere doesn't hurt.)

Whatever term is chosen, the important thing is to apply the terminology 
consistently so that it's clear what is meant.  I doubt that anyone has a term 
which every reader will intuitively and immediately associate with middle 
dot-separated digit increment by one.

If you want to emphasize the importance of a release, just choose a subjective 
term aside from major or minor.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing the XML batteries

2011-12-10 Thread Glyph Lefkowitz
On Dec 10, 2011, at 2:38 AM, Stefan Behnel wrote:

 Note, however, that html5lib is likely way too big to add it to the stdlib, 
 and that BeautifulSoup lacks a parser for non-conforming HTML in Python 3, 
 which would be the target release series for better HTML support. So, 
 whatever library or API you would want to use for HTML processing is 
 currently only the second question as long as Py3 lacks a real-world HTML 
 parser in the stdlib, as well as a robust character detection mechanism. I 
 don't think that can be fixed all that easily.


Here's the problem in a nutshell, I think:

Everybody wants an HTML parser in the stdlib, because it's inconvenient to pull 
in a dependency for such a simple task.
Everybody wants the stdlib to remain small, stable, and simple and not get 
overcomplicated.
Parsing arbitrary HTML5 is a monstrously complex problem, for which there exist 
rapidly-evolving standards and libraries to deal with it.  Parsing 'the web' 
(which is rapidly growing to include stuff like SVG, MathML etc) is even harder.

My personal opinion is that HTML5Lib gets this problem almost completely right, 
and so it should be absorbed by the stdlib.  Trying to re-invent this from 
scratch, or even use something like BeautifulSoup which uses a bunch of 
heuristics and hacks rather than reference to the laboriously-crafted standard 
that says exactly how parsing malformed stuff has to go to be like a browser, 
seems like it will just give the stdlib solution a reputation for working on 
the test input but not working in the real world.

(No disrespect to BeautifulSoup: it was a great attempt in the pre-HTML5 world 
which it was born into, and I've used it numerous times to implement useful 
things.  But much more effort has been poured into this problem since then, and 
the problems are better understood now.)

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing the XML batteries

2011-12-10 Thread Glyph Lefkowitz

On Dec 10, 2011, at 6:30 PM, Terry Reedy wrote:

 A little data: the HTML5lib project lives at
 https://code.google.com/p/html5lib/
 It has 4 owners and 22 other committers.
 
 The most recent release, html5lib 0.90 for Python, is nearly 2 years old. 
 Since there is a separate Python3 repository, and there is no mention on 
 Python3 compatibility elsewhere that I saw, including the pypi listing, I 
 assume that is for Python2 only.

I believe that you are correct.

 A comment on a recent (July 11) Python3 issue
 https://code.google.com/p/html5lib/issues/detail?id=187colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20Port
 suggest that the Python3 version still has problems. Merged in now, though 
 still lots of errors and failures in the testsuite.


I don't see what bearing this has on the discussion.  There are three possible 
ways I can imagine to interpret this information.

First, you could believe that porting a codebase from Python 2 to Python 3 is 
much easier than solving a difficult domain-specific problem.  In that case, 
html5lib has done the hard part and someone interested in html-in-the-stdlib 
should do the rest.

Second, you could believe that porting a codebase from Python 2 to Python 3 is 
harder than solving a difficult domain-specific problem, in which case 
something is seriously wrong with Python 3 or its attendant migration tools and 
that needs to be fixed, so someone should fix that rather than worrying about 
parsing HTML right now.  (I doubt that many subscribers to this list would 
share this opinion, though.)

Third, you could believe that parsing HTML is not a difficult domain-specific 
problem.  But only a crazy person would believe that, so you're left with one 
of the previous options :).

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] readd u'' literal support in 3.3?

2011-12-09 Thread Glyph

On Dec 9, 2011, at 12:43 AM, Guido van Rossum wrote:

 Even if it weren't slow, I still wouldn't use it to automatically
 convert code at install time; a single codebase is easier to reason
 about, and easier to support.  Users send me tracebacks all the time;
 having them match the source is a wonderful thing.
 
 Even though 2to3 was my idea, I am gradually beginning to appreciate this 
 approach. I skimmed the docs for six and liked it.

Actually, maybe I like it a bit better than I thought.

The biggest issue for the single-codebase approach is 'except ... as ...'.  
Peppering one's codebase with calls to sys.exc_info() can be a real performance 
problem, especially on PyPy.  Not to mention how ugly it is.  For some reason I 
thought that this syntax was only supported by 2.7 and up; I see now that it's 
2.6 and up.

This is still a problem for 2.5 support, of course, but 2.6-only may not be too 
far away for many projects; Twisted's support schedule for Python versions 
typically follows Ubuntu's, which means that we might be able to drop 2.5 as 
early as 2013! :).  Even in the plans that involve 2to3 though, drop 
everything prior to 2.6 was always supposed to be step 0, so single codebase 
adds much less of a burden than I thought.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] readd u'' literal support in 3.3?

2011-12-08 Thread Glyph
On Dec 8, 2011, at 7:32 AM, Nick Coghlan wrote:
 Having just purged so much cruft from the language, pleas to add some back 
 permanently for a problem that is going to fade from significance within the 
 next couple of years are unlikely to get very far.
 

This problem is never going to go away.

This is not a comment on the success of py3, but rather the persistence of old 
versions of things.  Even assuming an awesomely optimistic schedule for py3k 
migrations, even assuming that *everything* on PyPI supports Py3 by the end of 
2013, consider that all around the world, every day, new code is still being 
written in FORTRAN.  Much of it is being in FORTRAN 77, despite the fact that 
Fotran 90 is now over 20 years old.  Efforts still crop up periodically (some 
successful, some failed) to migrate these legacy projects to other languages, 
some of them as modern as C.

There are plenty of proprietary Python 2 systems which exist today for which 
there will not be a budget for a Python 3 migration this decade.  If history is 
an accurate guide, people will still be hired to work on python 2.x systems in 
the year 2100.  Some of them will be being hired to migrate that python 2.x 
code to python 3 (or 4, or 5, whatever we have by then).  If they're not, it 
will be because they're being hired to try to migrate it to Javascript instead, 
not because the Python 3 migration is done by then.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] readd u'' literal support in 3.3?

2011-12-08 Thread Glyph
Zooming back in to the actual issue this thread is about, I think the u-vs- 
issue is a bit of a red herring, because the _real_ problem here is that 2to3 
is slow and buggy and so migration efforts are starting to work around it, and 
therefore want to run the same code on 3.x and all the way back to 2.5.

In my opinion, effort should be spent on optimizing the suggested migration 
tools and getting them to work properly, not twiddling the syntax so that it's 
marginally easier to avoid them.

On Dec 8, 2011, at 4:27 PM, Martin v. Löwis wrote:

 This is not a comment on the success of py3, but rather the persistence
 of old versions of things.  Even assuming an awesomely optimistic
 schedule for py3k migrations, even assuming that *everything* on PyPI
 supports Py3 by the end of 2013, consider that all around the world,
 every day, new code is still being written in FORTRAN.
 
 While this is true for FORTRAN, it is not for Python 1.5: no new
 Python 1.5 code is written around the world, at least not every day.
 Also for FORTRAN, new code that is written every day likely isn't
 FORTRAN 66, but more likely FORTRAN 90 or newer.

That's because Python 1.5 was upward-compatible with 2.x, and pretty much 
everyone could gently migrate, and start developing on the new versions even 
while supporting the old ones.  That is obviously not true of 3.x, by design; 
2to3 requires that you still develop on the old version even if you support a 
new one, not to mention the substantially increased effort of migration.

 The reason for that is that FORTRAN just isn't an obsolete language,
 by any means, else people wouldn't bother producing new versions of
 it, porting compilers to new processors, and so on. Contrast this to
 Python 1, and soon Python 2, which actually *is* obsolete (just as
 FORTRAN 66 *is* obsolete).

Much as the Python core team might wish Python 2 would soon be obsolete, all 
of these things are happening for python 2.x now and all indications are that 
they will continue to happen.  PyPy, Jython, ShedSkin, Skulpt, IronPython, and 
possibly a few others are (to varying degrees) all targeting 2.x right now, 
because that's where the application code they want to run is.  PyPy is even 
porting the JIT compiler to a new processor (ARM).

F66 is indeed obsolete, but it became obsolete because people stopped using it, 
not because the standards committee declared it so.

 Much of it is being in FORTRAN 77
 
 Can you prove this? I trust that existing code is being maintained
 in FORTRAN 77. For new code, I'm skeptical.

I am not deeply immersed in the world where F77 is still popular, so I don't 
have any citations for you, but casual conversations with people working in the 
sciences, especially chemistry and materials science, suggests to me that a lot 
of F77 and start new projects in it.  (I can see someone with more direct 
experience promptly replied in this thread already, anyway.)

 There are plenty of proprietary Python 2 systems which exist today for
 which there will not be a budget for a Python 3 migration this decade.
 
 And people using it can happily continue to use Python 2. If they
 don't have a need to port their code to Python 3, they are not concerned
 by whether you use a u prefix for strings in Python 3 or not.


I didn't say they didn't have a need ever, I said they didn't have a budget 
now.  What you are saying to those users here is basically: if you can't 
migrate today, then just don't bother, we're never going to make it any 
easier.  Despite the fact that I ultimately agree on u'' (nobody should care 
about this), it is not a good message to send.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-11-30 Thread Glyph

On Nov 30, 2011, at 6:39 PM, Nick Coghlan wrote:

 On Thu, Dec 1, 2011 at 1:28 AM, PJ Eby p...@telecommunity.com wrote:
 It doesn't help at all that I'm not really in a position to provide an
 implementation, and the persons most likely to implement have been leaning
 somewhat towards 382, or wanting to modify 402 such that it uses .pyp
 directory extensions so that PEP 395 can be supported...
 
 While I was initially a fan of the possibilities of PEP 402, I
 eventually decided that we would be trading an easy problem (you need
 an '__init__.py' marker file or a '.pyp' extension to get Python to
 recognise your package directory) for a hard one (What's your
 sys.path look like? What did you mean for it to look like?). Symlinks
 (and the fact we implicitly call realname() during system
 initialisation and import) just make things even messier.
 *Deliberately* allowing package structures on the filesystem to become
 ambiguous is a recipe for future pain (and could potentially undo a
 lot of the good work done by PEP 328's elimination of implicit
 relative imports).
 
 I acknowledge there is a lot of confusion amongst novices as to how
 packages and imports actually work, but my diagnosis of the root cause
 of that problem is completely different from that supposed by PEP 402
 (as documented in the more recent versions of PEP 395, I've come to
 believe it is due to the way we stuff up the default sys.path[0]
 initialisation when packages are involved).
 
 So, in the end, I've come to strongly prefer the PEP 382 approach. The
 principle of Explicit is better than implicit applies to package
 detection on the filesystem just as much as it does to any other kind
 of API design, and it really isn't that different from the way we
 treat actual Python files (i.e. you can *execute* arbitrary files, but
 they need to have an appropriate extension if you want to import
 them).

I've helped an almost distressing number of newbies overcome their confusion 
about sys.path and packages.  Systems using Twisted are, almost by definition, 
hairy integration problems, and are frequently being created or maintained by 
people with little to no previous Python experience.

Given that experience, I completely agree with everything you've written above 
(except for the part where you initially liked it).  I appreciate the insight 
that PEP 402 offers about python's package mechanism (and the difficulties 
introduced by namespace packages).  Its statement of the problem is good, but 
in my opinion its solution points in exactly the wrong direction: packages need 
to be _more_ explicit about their package-ness and tools need to be stricter 
about how they're laid out.  It would be great if sys.path[0] were actually 
correct when running a script inside a package, or at least issued a warning 
which would explain how to correctly lay out said package.  I would love to see 
a loud alarm every time a module accidentally got imported by the same name 
twice.  I wish I knew, once and for all, whether it was 'import Image' or 'from 
PIL import Image'.

My hope is that if Python starts to tighten these things up a bit, or at least 
communicate better about best practices, editors and IDEs will develop better 
automatic discovery features and frameworks will start to normalize their 
sys.path setups and stop depending on accidents of current directory and script 
location.  This will in turn vastly decrease confusion among new python 
developers taking on large projects with a bunch of libraries, who mostly don't 
care what the rules for where files are supposed to go are, and just want to 
put them somewhere that works.

-glyph
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Warnings

2011-11-30 Thread Glyph

On Dec 1, 2011, at 1:10 AM, Raymond Hettinger wrote:

 When updating the documentation, please don't go overboard with warnings.
 The docs need to be worded affirmatively -- say what a tool does and show how 
 to use it correctly.
 See http://docs.python.org/documenting/style.html#affirmative-tone
 
 The docs for the subprocess module currently have SEVEN warning boxes on one 
 page:
 http://docs.python.org/library/subprocess.html#module-subprocess
 The implicit message is that our tools are hazardous and should be avoided.
 
 Please show some restraint and aim for clean looking, high-quality technical 
 writing without the FUD.
 
 Look at the SQLite3 docs for an example of good writing.  The prevention of 
 SQL injection attacks is discussed briefly and effectively without big red 
 boxes littering the page.

I'm not convinced this is actually a great example of how to outline pitfalls 
clearly; it doesn't say what an SQL injection attack is, or what the 
consequences might be.

Also, it's not the best example of a positive tone.  The narrative is:

You probably want to do X.
Don't do Y, because it will make you vulnerable to a Q attack.
Instead, do Z.
Here's an example of Y.  Don't do it!
Okay, finally, here's an example of Z.

It would be better to say You probably want to do X.  Here's how you do X, 
with Z.  Here's an example of Z.  Then, later, discuss why some people want to 
do Y, and why you should avoid that impulse.

However, what 'subprocess' is doing clearly isn't an improvement, it's not an 
effective introduction to secure process execution, just a reference document 
punctuated with ambiguous anxiety.  sqlite3 is at least somewhat specific :).

I think both of these documents point to a need for a recommended idiom for 
discussing security, or at least common antipatterns, within the Python 
documentation.  I like the IETF's security considerations section, because it 
separates things off into a section that can be referred to later, once the 
developer has had an opportunity to grasp the basics.  Any section with 
security implications can easily say please refer to the 'security 
considerations' section for important information on how to avoid common 
mistakes without turning into a big security digression on its own.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython (3.2): Issue #11956: Skip test_import.test_unwritable_directory on FreeBSD when run as

2011-10-07 Thread Glyph
On Oct 7, 2011, at 5:10 AM, Stephen J. Turnbull wrote:

 The principle here is ran as root without further explanation is a
 litmus test for not bothering about security, even today.  It's
 worth asking for explanation, or at least a comment that all the
 buildbot contributors I've talked to have put a lot of effort into
 security configuration.

This is a valid point.  I think that Cameron and I may have had significantly 
different assumptions about the environment being discussed here.  I may have 
brought some assumptions about the build farm here that don't actually apply to 
the way Python does it.

To sum up what I believe is now the consensus from this thread:

Anyone setting up a buildslave should take care to invoke the build in an 
environment where an out-of-control buildbot, potentially executing arbitrarily 
horrible and/or malicious code, should not damage anything.  Builders should 
always be isolated from valuable resources, although the specific mechanism of 
isolation may differ.  A virtual machine is a good default, but may not be 
sufficient; other tools for cutting of the builder from the outside world would 
be chroot jails, solaris zones, etc.
Code runs differently as privileged vs. unprivileged users.  Therefore builders 
should be set up in both configurations, running the full test suite, to ensure 
that all code runs as expected in both configurations.  Some tests, as the 
start of this thread indicates, must have some special logic to make sure they 
do or do not run, or run differently, in privileged vs. unprivileged 
configurations, but generally speaking most things should work in both places.
Access to root my provide access to slightly surprising resources, even within 
a VM (such as the ability to send spoofed IP packets, change the MAC address of 
even virtual ethernet cards, etc), and administrators should be aware that this 
is the case when configuring the host environment for a run-as-root builder.  
You don't want to end up with a compromised test VM that can snoop on your 
network.

Have I left anything out? :-)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython (3.2): Issue #11956: Skip test_import.test_unwritable_directory on FreeBSD when run as

2011-10-07 Thread Glyph

On Oct 7, 2011, at 6:40 AM, Cameron Simpson wrote:

 I think that the build and the tests should be different security
 scopes/zones/levels: different users or different VMs. Andrew's
 suggestion of a VM-for-tests sounds especially good.

To me, build and test are largely the same function, since a build whose 
tests haven't been run is just a bag of bits :).  But in the sense that root 
should never be required to do a build, I don't see a reason to bother 
supporting that configuration: it makes sense to always do the build as a 
regular user.

 And that I think the as-root tests suite shouldn't run unless the
 not-root test suite passes.


Why's that?  The as-root VM needs to be equally secure either way, and it's a 
useful data point to see that the as-root tests *didn't* break, if they didn't; 
this way a developer can tell at a glance that the failure is either a test 
that needs to be marked as 'root only' or a change that causes permissions to 
be required that it shouldn't have.

(In general I object to suggestions of the form don't run the tests unless X, 
unless X is a totally necessary pre-requisite like the compile finished.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython (3.2): Issue #11956: Skip test_import.test_unwritable_directory on FreeBSD when run as

2011-10-07 Thread Glyph

On Oct 7, 2011, at 7:10 AM, Cameron Simpson wrote:

 The point here is security, not test coverage: if a procedure is known
 to be broken as a regular user, is it not highly unsafe to then run it
 as root?

No. As I mentioned previously, any environment where the tests are run should 
be isolated from any resources that are even safety-relevant, let alone 
safety-critical, whether they're running as a regular user _or_ root.

In theory, one might automatically restore the run-as-root buildslave VM from a 
snapshot before every single test run.  In practice this is probably too 
elaborate to bother with and an admin can just hit the 'restore' button in the 
fairly unlikely case that something does happen to break the buildslave.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython (3.2): Issue #11956: Skip test_import.test_unwritable_directory on FreeBSD when run as

2011-10-06 Thread Glyph
On Oct 5, 2011, at 10:46 PM, Cameron Simpson wrote:

 Surely VERY FEW tests need to be run as root, and they need careful
 consideration. The whole thing (build, full test suite) should
 not run as root.

This is news to me - is most of Python not supported to run as root?  I was 
under the impression that Python was supposed to run correctly as root, and 
therefore there should be some buildbots dedicated to running it that way.  If 
only a few small parts of the API are supposed to work perhaps this should be 
advertised more clearly in the documentation?

Ahem.  Sorry for the snark, I couldn't resist.  As terry more reasonably put it:

 running buildbot tests as root does not reflect the experience of non-root 
 users. It seems some tests need to be run both ways just for correctness 
 testing.

(except I'd say all, not some)

 Am I really the only person who feels unease about this scenario?


More seriously: apparently you are not, but I am quite surprised by that 
revelation.  You should be :).  The idea of root as a special, magical place 
where real ultimate power resides is quite silly.  root is a title, like 
king.  You're not just root, you're root _of_ something.  If the thing that 
you are root of is a dedicated virtual machine with no interesting data besides 
the code under test, then this is quite a lot like being a regular user in a 
similarly boring place.  It's like having the keys to an empty safe.

Similarly, if you're a normal unprivileged user - let's say, www-data - on a 
system with large amounts of sensitive data owned by that user, becoming root 
will rarely grant you any really interesting privileges beyond what you've 
already got.  Most public web-based systems fall into this category, as you've 
got one user (the application deployment user) running almost all of your code, 
with privileges to read and write to the only interesting data source (the 
database).  So if these tests were running on somebody's public-facing 
production system in an unprivileged context, I'd be far more concerned about 
that than about it having root on some throwaway VM.

-glyph


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython (3.2): Issue #11956: Skip test_import.test_unwritable_directory on FreeBSD when run as

2011-10-06 Thread Glyph

On Oct 6, 2011, at 10:11 PM, Cameron Simpson wrote:

 Hmm. Glyph seemed to be arguing both ways - that everything should be
 tested as root, and also that root is not special. I have unease over the
 former and disagreement over the latter.

Your reply to Stephen suggests that we are actually in agreement, but just to 
be clear: I completely understand that root is special in that the environment 
allows for several behaviors which are not true for a normal user.  Which is 
precisely why it must be tested by a (properly sandboxed) buildbot :).

It's just not special in the sense that having root on a throwaway VM would 
allow you to do non-throwaway things.  The one thing one must always be careful 
of, of course, is having your bandwidth chewed up for some nefarious purpose 
(spam, phishing) but that sort of thing should be caught with other monitoring 
tools.

Plus, there are lots of other impediments to getting Python's buildbots to do 
something nasty.  Only people with a commit bit should be able to actually push 
changes that buildbot will see.  So avoiding root is more about avoiding 
mistakes than avoiding attacks.  (After all, if this process isn't completely 
secure, then neither is the Python that's shipped in various OSes: in which 
case, game over _everywhere_.)

Finally, and unfortunately, there are so many privilege escalation exploits in 
so many different daemons and applications that it's foolish to treat root as 
too terribly special: unless you're a real hardening expert and you spend a lot 
of effort keeping up to the second on security patches, the ability to execute 
completely arbitrary untrusted code as a unprivileged local user on your system 
can likely be converted with little effort into the ability to execute 
arbitrary untrusted code as root.  Although, ironically, buildbots are often 
minimally configured and don't run any other services, so maybe these 
environments are one of the few places where it actually does make a difference 
:-).

(Which is precisely why all daemons everywhere should be written in Python.  
Buffer overflows are dumb, it's 2011 already, come on.  Use Twisted.)___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-11 Thread Glyph Lefkowitz

On Sep 11, 2011, at 11:49 AM, Michael Foord wrote:

 Does anyone *actually* use .title() for this? (And why not just use the 
 correct casing in the string literal...)

Yes.  Twisted does, in various MIME-ish places (IMAP, SIP), although not in 
HTTP from what I can see.  I imagine other similar software would as well.

One issue is that you don't always have a string literal to work with.  If 
you're proxying traffic, you start from a mis-cased header and you possibly 
need to correct it to a canonically-cased one.  (On at least one occasion I've 
had to use such a proxy to make certain buggy client software work.)

Of course you could have something like {bCONNECTION-LOST: 
bConnection-Lost, ...} somewhere at module scope, but that feels a bit 
sillier than just having a nice '.title()' method.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Glyph Lefkowitz
On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:

 How about title?

 'content-length'.title()
'Content-Length'

You might say that the protocol has to be case-insensitive so this is a silly 
frill: there are definitely enough case-sensitive crappy bits of network 
middleware out there that this function is critically important for an HTTP 
server.

In general I'd like to defend keeping as many of these methods as possible for 
compatibility (porting to Py3 is already hard enough).  Although even I might 
have a hard time defending 'swapcase', which is never used _at all_ within 
Twisted, on text or bytes.  The only use-case I can think of for that method is 
goofy joke text filters, and it wouldn't be very good at that either.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 optimizations continued...

2011-09-01 Thread Glyph Lefkowitz

On Sep 1, 2011, at 5:23 AM, Cesare Di Mauro wrote:

 A simple solution: when tracing is enabled, the new instruction format will 
 never be executed (and information tracking disabled as well).

Correct me if I'm wrong: doesn't this mean that no profiler will accurately be 
able to measure the performance impact of the new instruction format, and 
therefore one may get incorrect data when on is trying to make a CPU 
optimization for real-world performance?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)

2011-08-28 Thread Glyph Lefkowitz

On Aug 28, 2011, at 7:27 PM, Guido van Rossum wrote:

 In general, an existing library cannot be called
 without access to its .h files -- there are probably struct and
 constant definitions, platform-specific #ifdefs and #defines, and
 other things in there that affect the linker-level calling conventions
 for the functions in the library.

Unfortunately I don't know a lot about this, but I keep hearing about something 
called rffi that PyPy uses to call C from RPython: 
http://readthedocs.org/docs/pypy/en/latest/rffi.html.  This has some 
shortcomings currently, most notably the fact that it needs those .h files (and 
therefore a C compiler) at runtime, so it's currently a non-starter for code 
distributed to users.  Not to mention the fact that, as you can see, it's not 
terribly thoroughly documented.  But, that ExternalCompilationInfo object 
looks very promising, since it has fields like includes, libraries, etc.

Nevertheless it seems like it's a bit more type-safe than ctypes or cython, and 
it seems to me that it could cache some of that information that it extracts 
from header files and store it for later when a compiler might not be around.

Perhaps someone with more PyPy knowledge than I could explain whether this is a 
realistic contender for other Python runtimes?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-12 Thread Glyph Lefkowitz

On Aug 12, 2011, at 11:24 AM, P.J. Eby wrote:

 That is, the above code hardocdes a variety of assumptions about the import 
 system that haven't been true since Python 2.3.

Thanks for this feedback.  I honestly did not realize how old and creaky this 
code had gotten.  It was originally developed for Python 2.4 and it certainly 
shows its age.  Practically speaking, the code is correct for the bundled 
importers, and paths and zipfiles are all we've cared about thus far.

 (For example, it assumes that the contents of sys.path strings have 
 inspectable semantics, that the contents of __file__ can tell you things 
 about the module-ness or package-ness of a module object, etc.)

Unfortunately, the primary goal of this code is to do something impossible - 
walk the module hierarchy without importing any code.  So some heuristics are 
necessary.  Upon further reflection, PEP 402 _will_ make dealing with namespace 
packages from this code considerably easier: we won't need to do AST analysis 
to look for a __path__ attribute or anything gross like that improve 
correctness; we can just look in various directories on sys.path and accurately 
predict what __path__ will be synthesized to be.

However, the isPackage() method can and should be looking at the module if it's 
already loaded, and not always guessing based on paths.  The whole reason 
there's an 'importPackages' flag to walk() is that some applications of this 
code care more about accuracy than others, so it tries to be as correct as it 
can be.

(Of course this is still wrong for the case where a __path__ is dynamically 
constructed by user code, but there's only so well one can do at that.)

 If you want to fully support PEP 302, you might want to consider making this 
 a wrapper over the corresponding pkgutil APIs (available since Python 2.5) 
 that do roughly the same things, but which delegate all path string 
 inspection to importer objects and allow extensible delegation for importers 
 that don't support the optional methods involved.

This code still needs to support Python 2.4, but I will make a note of this for 
future reference.

 (Of course, if the pkgutil APIs are missing something you need, perhaps you 
 could propose additions.)

 Now it seems like pure virtual packages are going to introduce a new type of 
 special case into the hierarchy which have neither .pathEntry nor .filePath 
 objects.
 
 The problem is that your API's notion that these things exist as coherent 
 concepts was never really a valid assumption in the first place.  .pth files 
 and namespace packages already meant that the idea of a package coming from a 
 single path entry made no sense.  And namespace packages installed by 
 setuptools' system packaging mode *don't have a __file__ attribute* today...  
 heck they don't have __init__ modules, either.

The fact that getModule('sys') breaks is reason enough to re-visit some of 
these design decisions.

 So, adding virtual packages isn't actually going to change anything, except 
 perhaps by making these scenarios more common.

In that case, I guess it's a good thing; these bugs should be dealt with.  
Thanks for pointing them out.  My opinion of PEP 402 has been completely 
reversed - although I'd still like to see a section about the module system 
from a library/tools author point of view rather than a time-traveling perl 
user's narrative :).

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-12 Thread Glyph Lefkowitz

On Aug 12, 2011, at 2:33 PM, P.J. Eby wrote:

 At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote:
 Upon further reflection, PEP 402 _will_ make dealing with namespace packages 
 from this code considerably easier: we won't need to do AST analysis to look 
 for a __path__ attribute or anything gross like that improve correctness; we 
 can just look in various directories on sys.path and accurately predict what 
 __path__ will be synthesized to be.
 
 The flip side of that is that you can't always know whether a directory is a 
 virtual package without deep inspection: one consequence of PEP 402 is that 
 any directory that contains a Python module (of whatever type), however 
 deeply nested, will be a valid package name.  So, you can't rule out that a 
 given directory *might* be a package, without walking its entire reachable 
 subtree.  (Within the subset of directory names that are valid Python 
 identifiers, of course.)

Are there any rules about passing invalid identifiers to __import__ though, or 
is that just less likely? :)

 However, you *can* quickly tell that a directory *might* be a package or is 
 *probably* one: if it contains modules, or is the same name as an 
 already-discovered module, it's a pretty safe bet that you can flag it as 
 such.

I still like the idea of a 'marker' file.  It would be great if there were a 
new marker like __package__.py.  I say this more for the benefit of users 
looking at a directory on their filesystem and trying to understand whether 
this is a package or not than I do for my own programmatic tools though; it's 
already hard enough to understand the package-ness of a part of your filesystem 
and its interactions with PYTHONPATH; making directories mysteriously and 
automatically become packages depending on context will worsen that situation, 
I think.

I also have this not-terribly-well-defined idea that it would be handy for 
different providers of the _contents_ of namespace packages to provide their 
own instrumentation to be made aware that they've been added to the __path__ of 
a particular package.  This may be a solution in search of a problem, but I 
imagine that each __package__.py would be executed in the same module 
namespace.  This would allow namespace packages to do things like set up 
compatibility aliases, lazy imports, plugin registrations, etc, as they 
currently do with __init__.py.  Perhaps it would be better to define its 
relationship to the package-module namespace in a more sensible way than 
execute all over each other in no particular order.

Also, if I had my druthers, Python would raise an exception if someone added a 
directory marked as a package to sys.path, to refuse to import things from it, 
and when a submodule was run as a script, add the nearest directory not marked 
as a package to sys.path, rather than the script's directory itself.  The whole 
__name__ is wrong because your current directory was wrong when you ran that 
command thing is so confusing to explain that I hope we can eventually consign 
it to the dustbin of history.  But if you can't even reasonably guess whether a 
directory is supposed to be an entry on sys.path or a package, that's going to 
be really hard to do.

 In any case, you probably should *not* do the building of a virtual path 
 yourself; the protocols and APIs added by PEP 402 should allow you to simply 
 ask for the path to be constructed on your behalf.  Otherwise, you are going 
 to be back in the same business of second-guessing arbitrary importer 
 backends again!

What do you mean building of a virtual path?

 (E.g. note that PEP 402 does not say virtual package subpaths must be 
 filesystem or zipfile subdirectories of their parents - an importer could 
 just as easily allow you to treat subdirectories named 'twisted.python' as 
 part of a virtual package with that name!)
 
 Anyway, pkgutil defines some extra methods that importers can implement to 
 support module-walking, and part of the PEP 402 implementation should be to 
 make this support virtual packages as well.

The more that this can focus on module-walking without executing code, the 
happier I'll be :).

 This code still needs to support Python 2.4, but I will make a note of this 
 for future reference.
 
 A suggestion: just take the pkgutil code and bundle it for Python 2.4 as 
 something._pkgutil.  There's very little about it that's 2.5+ specific, at 
 least when I wrote the bits that do the module walking.
 
 Of course, the main disadvantage of pkgutil for your purposes is that it 
 currently requires packages to be imported in order to walk their child 
 modules.  (IIRC, it does *not*, however, require them to be imported in order 
 to discover their existence.)

One of the stipulations of this code is that it might give different results 
when the modules are loaded and not.  So it's fine to inspect that first and 
then invoke pkgutil only in the 'loaded' case, with the knowledge that the 
not-loaded case may

Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-11 Thread Glyph Lefkowitz
On Aug 11, 2011, at 11:39 AM, Barry Warsaw wrote:

 On Aug 11, 2011, at 04:39 PM, Éric Araujo wrote:
 
 * XXX what is the __file__ of a pure virtual package?  ``None``?
  Some arbitrary string?  The path of the first directory with a
  trailing separator?  No matter what we put, *some* code is
  going to break, but the last choice might allow some code to
  accidentally work.  Is that good or bad?
 A pure virtual package having no source file, I think it should have no
 __file__ at all.  I don’t know if that would break more code than using
 an empty string for example, but it feels righter.
 
 I agree that the empty string is the worst of the choices.  no __file__ or
 __file__=None is better.

In some sense, I agree: hacks like empty strings are likely to lead to 
path-manipulation bugs where the wrong file gets opened (or worse, deleted, 
with predictable deleterious effects).  But the whole pure virtual mechanism 
here seems to pile even more inconsistency on top of an already irritatingly 
inconsistent import mechanism.  I was reasonably happy with my attempt to paper 
over PEP 302's weirdnesses from a user perspective:

http://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.html

(or https://launchpad.net/modules if you are not a Twisted user)

Users of this API can traverse the module hierarchy with certain expectations; 
each module or package would have .pathEntry and .filePath attributes, each of 
which would refer to the appropriate place.  Of course __path__ complicates 
things a bit, but so it goes.

Now it seems like pure virtual packages are going to introduce a new type of 
special case into the hierarchy which have neither .pathEntry nor .filePath 
objects.

Rather than a one-by-one ad-hoc consideration of which attribute should be set 
to None or empty strings or string or what have you, I'd really like to see 
a discussion in the PEP saying what a package really is vs. what a module is, 
and what one can reasonably expect from it from an API and tooling perspective. 
 Right now I have to puzzle out the intent of the final API from the 
problem/solution description and thought experiment.

Despite authoring several namespace packages myself, I don't have any of the 
problems described in the PEP.  I just want to know how to write correct tools 
given this new specification.  I suspect that this PEP will be the only 
reference for how packages work for a long time coming (just as PEP 302 was 
before it) so it should really get this right.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] HTMLParser and HTML5

2011-07-29 Thread Glyph Lefkowitz

On Jul 29, 2011, at 7:46 AM, Stefan Behnel wrote:

 Joao S. O. Bueno, 29.07.2011 13:22:
 On Fri, Jul 29, 2011 at 1:37 AM, Stefan Behnel wrote:
 Brett Cannon, 28.07.2011 23:49:
 
 On Thu, Jul 28, 2011 at 11:25, Matt wrote:
 
 - What policies are in place for keeping parity with other HTML
 parsers (such as those in web browsers)?
 
 There aren't any beyond it would be nice.
 [...]
 It's more of an issue of someone caring enough to do the coding work to
 bring the parser up to spec for HTML5 (or introduce new code to live
 beside
 the HTML4 parsing code).
 
 Which, given that html5lib readily exists, would likely be a lot more work
 than anyone who is interested in HTML5 handling would want to invest.
 
 I don't think we need a new HTML5 parsing implementation only to have it in
 the stdlib. That's the old sunny Java way of doing it.
 
 I disaagree.
 Having proper html parsing out of the box is part of the batteries
 included thing.
 
 Well, you can easily prove me wrong by implementing this.
 
 Stefan

Please don't implement this just to profe Stefan wrong :).

The thing to do, if you want html parsing in the stdlib, is to _incorporate_ 
html5lib, which is already a perfectly good, thoroughly tested HTML parser, and 
simply deprecate HTMLParser and friends.  Implementing a new parser would serve 
no purpose I can see.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] HTMLParser and HTML5

2011-07-29 Thread Glyph Lefkowitz
On Jul 29, 2011, at 3:00 PM, Matt wrote:

 I don't see any real reason to drop a decent piece of code (HTMLParser, that 
 is) in favor of a third party library when only relatively minor updates are 
 needed to bring it up to speed with the latest spec.

I am not really one to throw stones here, as Twisted contains a lenient 
pseudo-XML parser which I still maintain - one which decidedly does not agree 
with html5's requirements for dealing with invalid data, but just a bunch of 
ad-hoc guesses of my own.

My impression of HTML5 is that HTMLParser would require significant 
modifications and possibly a drastic re-architecture in order to really do 
HTML5 right; especially the parts that the html5lib authors claim makes HTML5 
streaming-unfriendly, i.e. subtree reordering when encountering certain types 
of invalid data.

But if I'm wrong about that, and there are just a few spec updates and bugfixes 
that need to be applied, by all means, ignore my comment.

-glyph


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Comments of the PEP 3151

2011-07-26 Thread Glyph Lefkowitz

On Jul 26, 2011, at 6:49 PM, Antoine Pitrou wrote:

 On Mon, 25 Jul 2011 15:28:47 +1000
 Nick Coghlan ncogh...@gmail.com wrote:
 There may be some error codes that we choose to map to these generic
 errors, even if we don't give them their own exception types at this
 point (e.g. ECONSHUTDOWN could map directly to ConnectionError).
 
 Ok, I can find neither ECONSHUTDOWN nor ECONNSHUTDOWN on
 www.opengroup.org, and it's not mentioned in errnomodule.c.  Is it some
 system-specific error code?

I assume that ESHUTDOWN is the errno in question?  (This is also already 
mentioned in the PEP.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The socket HOWTO

2011-06-07 Thread Glyph Lefkowitz
On Jun 5, 2011, at 3:35 PM, Martin v. Löwis wrote:

 And that's all fine. I still claim that you have to *understand*
 sockets in order to use it properly. By this, I mean stuff like
 what is a TCP connection? how is it established?, how is UDP
 different from TCP?, when data arrives, what layers of software
 does it go through?, what is a port number?, etc.

Yes, these are all excellent concepts to be familiar with.  But the word 
socket (and the socket HOWTO) refers to a specific way to interface with 
those concepts, the Berkeley socket API: 
http://en.wikipedia.org/wiki/Berkeley_sockets.  Which you don't have to know 
anything about if you're going to use Twisted.  You should know about IPC in 
general, and TCP/UDP specifically if you're going to use Twisted, but sockets 
are completely optional.

Also, I feel that I should point out that the sockets HOWTO does not cover even 
a single one of these concepts in any useful depth.  If you think that these 
are what it should be explaining, it needs some heavy editing.  Here's what it 
has to say about each one:

 what is a TCP connection?

The only place that the characters TCP appear in the entire document is in 
the phrase ... which is completely different from TCP_NODELAY   Nowhere 
is a TCP connection explained at a conceptual level, except to say that it's 
something a web browser does.

 how is UDP different from TCP?

The phrase UDP never appears in the HOWTO.  DGRAM sockets get a brief mention 
as anything else in the sentence: ... you’ll get better behavior and 
performance from a STREAM socket than anything else   (To be fair, I do 
endorse teaching that the difference between TCP and UDP is that you should 
not use UDP to anyone not sufficiently advanced to read the relevant reference 
documentation themselves.)

 when data arrives, what layers of software does it go through?

There's no discussion of this that I can find at all.

 what is a port number?

Aside from a few comments in the code examples, the only discussion of port 
numbers is low number ports are usually reserved for “well known” services 
(HTTP, SNMP etc).

It would be very good to have a Python networking overview somewhere that 
explained this stuff at a very high level, and described how data might get 
into or out of your program, with links to things like the socket HOWTO that 
describe more specific techniques.  This would be useful because most commonly, 
I think that data will get into Python network programs via WSGI, not direct 
sockets or anything like Twisted.

To be clear, having read it now: I do _not_ agree with Antoine that this 
document should be deleted.  I dimly recall that it helped me understand some 
things in the very early days of Twisted.  While it's far from perfect, it 
might help someone in a similar situation understand those things as well 
today.  I just found it interesting that the main concepts one would associate 
with such a HOWTO are nowhere to be found :).

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The socket HOWTO

2011-06-05 Thread Glyph Lefkowitz
On Jun 4, 2011, at 11:32 PM, Martin v. Löwis wrote:

 b) telling people to use Twisted or asyncore on the server side
   if they are new to sockets is bad advice. People *first* have
   to understand sockets, and *then* can use these libraries
   and frameworks. Those libraries aren't made to be black boxes
   that work even if you don't know how - you *have* to know how
   they work inside, or else you can't productively use them.


First, Twisted doesn't always use the BSD sockets API; the Windows IOCP 
reactor, especially, starts off with the socket() function, but things go off 
in a different direction pretty quickly from there.  So it's perfectly fine to 
introduce yourself to networking via Twisted, and many users have done just 
that.  If you're using it idiomatically, you should never encounter a socket 
object or file descriptor poking through the API anywhere.  Asyncore is 
different: you do need to know how sockets work in order to use it, because 
you're expected to call .send() and .recv() yourself.  (And, in my opinion, 
this is a serious design flaw, for reasons which will hopefully be elucidated 
in the PEP that Laurens is now writing.)

Second, it makes me a little sad that it appears to be folk wisdom that Twisted 
is only for servers.  A lot of work has gone into making it equally appropriate 
for clients.  This is especially true if your client has a GUI, where Twisted 
is often better than a protocol-specific library, which may either be blocking 
or have its own ad-hoc event loop.

I don't have an opinion on the socket HOWTO per se, only on the possibility of 
linking to Twisted as an alternate implementation mechanism.  It really would 
be better to say go use Twisted rather than reading any of the following than 
read the following, which will help you understand Twisted.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Glyph Lefkowitz

On May 19, 2011, at 1:43 PM, Guido van Rossum wrote:

 -1; the result is not a *character* but an integer.

Well, really the result ought to be an octet, but I suppose adding an 'octet' 
type is beyond the scope of even this sprawling discussion :).

 I'm personally favoring using b'a'[0] and possibly hiding this in a constant 
 definition.

As someone who spends a frankly unfortunate amount of time handling protocols 
where things like this are necessary, I agree with this recommendation.  In 
protocols where one needs to compare network data with one-byte type 
identifiers or packet prefixes, more (documented) constants and less 
inscrutable junk like

if p == 'c':
   ...
elif p == 'j':
   ...
elif p == 'J': # for compatibility
   ...

would definitely be a good thing.  Of course, I realize that this sort of 
programmer will most likely replace those constants with 99, 106, 74 than take 
a moment to document what they mean, but at least they'll have to pause for a 
moment and realize that they have now lost _all_ mnemonics...

In fact, I feel like I would want to push in the opposite direction: don't 
treat one-byte bytes slices less like integers; I wish I could more easily 
treat n-byte sequences _more_ like integers! :).  More protocols have 2-byte or 
4-byte network-endian packed integers embedded in them than have individual tag 
bytes that I want to examine.  For the typical ASCII-ish protocol where you 
want to look at command names and CRLF-separated messages, you'd never want to 
look at an individual octet, stringish operations like split() will give you 
what you want.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Glyph Lefkowitz
On May 6, 2011, at 12:31 PM, Michael Foord wrote:

 pypy and .NET choose to arbitrarily break cycles rather than leave objects 
 unfinalised and memory unreclaimed. Not sure what Java does.

I think that's a mischaracterization of their respective collectors; 
arbitrarily break cycles implies that user code would see broken or 
incomplete objects, at least during finalization, which I'm fairly sure is not 
true on either .NET or PyPy.

Java definitely has a collector that can handles cycles too.  (None of these 
are reference counting.)

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Glyph Lefkowitz
Apologies in advance for contributing to an obviously and increasingly 
off-topic thread, but this kind of FUD about GC is a pet peeve of mine.

On May 6, 2011, at 10:04 AM, Neal Becker wrote:

 http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html

Counterpoint: http://lwn.net/Articles/268783/.  Sorry Linus, sometimes 
correctness matters more than performance.

But, even the performance argument is kind of bogus.  See, for example, this 
paper on real-time garbage collection: 
http://domino.research.ibm.com/comm/research_people.nsf/pages/dgrove.ecoop07.html.
  That's just one example of an easy-to-find solution to a problem that Linus 
holds up as unsolved or unsolvable.  There are solutions to pretty much all of 
the problems that Linus brings up.  One of these solutions is even famously 
implemented by CPython!  The CPython string += idiom optimization fixes at 
least one case of the you tend to always copy the node antipattern Linus 
describes, and lots of languages (especially Scheme and derivatives, IIRC) have 
very nice optimizations around this area.  One could argue that any functional 
language without large pools of mutable state (i.e. Erlang) is a massive 
optimization for this case.

Another example: the dirty cache problem Linus talks about can be addressed 
by having a GC that cooperates with the VMM: 
http://www.cs.umass.edu/~emery/pubs/f034-hertz.pdf.

And the re-using stuff as fast as possible thing is exactly the kind of 
problem that generational GCs address.  When you run out of space in cache, you 
reap your first generation before you start copying stuff.  One of the key 
insights of generational GC is that you'll usually reclaim enough (in this 
case, cache-local) memory that you can keep going for a little while.  You 
don't have to read a super fancy modern paper on this, Wikipedia explains 
nicely: 
http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Generational_GC_.28ephemeral_GC.29.
  Of course if you don't tune your GC at all for your machine-specific cache 
size, you won't see this performance benefit play out.

I don't know if there's a programming language and runtime with a real-time, 
VM-cooperating garbage collector that actually exists today which has all the 
bells and whistles required to implement an OS kernel, so I wouldn't give the 
Linux kernel folks too much of a hard time for still using C; but there's 
nothing wrong with the idea in the abstract.  The performance differences 
between automatic and manual GC are dubious at best, and with a really good GC 
and a language that supports it, GC tends to win big.  When it loses, it loses 
in ways which can be fixed in one area of the code (the GC) rather than 
millions of tiny fixes across your whole codebase, as is the case with 
strategies used by manual collection algorithms.

The assertion that modern hardware is not designed for big data-structure 
pointer-chasing is also a bit silly.  On the contrary, modern hardware has 
evolved staggeringly massive caches, specifically because large programs 
(whether they're GC'd or not) tend to do lots of this kind of thing, because 
there's a certain level of complexity beyond which one can no longer avoid it.  
It's old hardware, with tiny caches (that were, by virtue of their tininess, 
closer to the main instruction-processing silicon), that was optimized for the 
carefully stack-allocating everything in the world to conserve cache approach.

You can see this pretty clearly by running your favorite Python benchmark of 
choice on machines which are similar except for cache size.  The newer machine, 
with the bigger cache, will run Python considerably faster, but doesn't help 
the average trivial C benchmark that much - or, for that matter, Linux 
benchmarks.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] the role of assert in the standard library ?

2011-04-28 Thread Glyph Lefkowitz

On Apr 28, 2011, at 12:59 PM, Guido van Rossum wrote:

 On Thu, Apr 28, 2011 at 12:54 AM, Tarek Ziadé ziade.ta...@gmail.com wrote:
 In my opinion assert should be avoided completely anywhere else than
 in the tests. If this is a wrong statement, please let me know why :)
 
 I would turn that around. The assert statement should not be used in
 unit tests; unit tests should use self.assertXyzzy() always. In
 regular code, assert should be about detecting buggy code. It should
 not be used to test for error conditions in input data. (Both these
 can be summarized as if you still want the test to happen with -O,
 don't use assert.)

You're both right! :)  My take on assert is don't use it, ever.

assert is supposed to be about conditions that never happen.  So there are a 
few cases where I might use it:

If I use it to enforce a precondition, it's wrong because under -OO my 
preconditions won't be checked and my input might be invalid.

If I use it to enforce a postcondition, then my API's consumers have to 
occasionally handle this weird error, except it won't be checked under -OO so 
they won't be able to handle it consistently.

If I use it to try to make assertions about internal state during a 
computation, then I introduce an additional, untested (at the very least 
untested under -OO), probably undocumented (did I remember to say and raises 
AssertionError when... in its docstring?) code path where when this bad 
thing happens, I get an exception instead of a result.

If that's an important failure mode, then there ought to be a documented 
exception, which the computation's consumers can deal with.

If it really should never happen, then I really should have just written some 
unit tests verifying that it doesn't happen in any case I can think of.  And I 
shouldn't be writing code to handle cases I can't come up with any way to 
exercise, because how do I know that it's going to do the right thing?  (If I 
had a dollar for every 'assert' message that didn't have the right number of 
arguments to its format string, etc.)

Also, when things that should never happen do actually happen in real life, 
is a random exception that interrupts the process actually an improvement over 
just continuing on with some potentially bad data?  In most cases, no, it 
really isn't, because by blowing up you've removed the ability of the user to 
take corrective action or do a workaround.  (In the cases where blowing up is 
better because you're about to do something destructive, again, a test seems in 
order.)

My python code is very well documented, which means that there is sometimes a 
significant runtime overhead from docstrings.  That's really my only interest 
in -OO: reducing memory footprint of Python processes by dropping dozens of 
megabytes of library documentation from each process.  The fact that it changes 
the semantics of 'assert' is an unfortunate distraction.

So the only time I'd even consider using 'assert' is in a throwaway script 
which might be run once, that I'm not going to write any tests for and I'm not 
going to maintain, but I might care about just enough to want to blow up 
instead of calling 'os.unlink' if certain conditions are not met.

(But then every time I actually use it that way, I realize that I should have 
dealt with the error sanely and I probably have to go back and fix it anyway.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python and super

2011-04-14 Thread Glyph Lefkowitz
On Apr 14, 2011, at 12:59 PM, Ronald Oussoren wrote:

 What would the semantics be of a super that (...)

I think it's long past time that this move to python-ideas, if you don't mind.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Supporting Visual Studio 2010

2011-04-05 Thread Glyph Lefkowitz

On Apr 5, 2011, at 8:52 AM, exar...@twistedmatrix.com wrote:

 On 09:58 am, mar...@v.loewis.de wrote:
 Won't that still be an issue despite the stable ABI? Extensions on
 Windows should be linked to the same version of MSVCRT used to compile
 Python
 
 Not if they use the stable ABI. There still might be issues if you
 mix CRTs, but none related to the Python ABI - in particular, none
 of those crashing conditions can arise from the stable ABI.
 
 Does this mean new versions of distutils let you build_ext with any C 
 compiler, instead of enforcing the same compiler as it has done previously?  
 That would be great.

That *would* be great.  But is it possible?

http://www.python.org/dev/peps/pep-0384/ says functions expecting FILE* are 
not part of the ABI, to avoid depending on a specific version of the Microsoft 
C runtime DLL on Windows.  Can extension modules that need to read and write 
files practically avoid all of those functions?  (If your extension module 
links a library with a different CRT, but doesn't pass functions back and forth 
to Python, is that OK?)

The PEP also says that it will allow users to check whether their modules 
conform to the ABI, but it doesn't say how that will be done.  How can we 
build extension modules so that we're sure we're ABI-conformant?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Policy for making changes to the AST

2011-04-04 Thread Glyph Lefkowitz

On Apr 4, 2011, at 2:00 PM, Guido van Rossum wrote:

 On Mon, Apr 4, 2011 at 10:05 AM, fwierzbi...@gmail.com
 fwierzbi...@gmail.com wrote:
 As a re-implementor of ast.py that tries to be node for node
 compatible, I'm fine with #1 but would really like to have tests that
 will fail in test_ast.py to alert me!
 
 [and]
 
 On Mon, Apr 4, 2011 at 10:38 AM, Michael Foord
 fuzzy...@voidspace.org.uk wrote:
 A lot of tools that work with Python source code use ast - so even though
 other implementations may not use the same ast under the hood they will
 probably at least *want* to provide a compatible implementation. IronPython
 is in that boat too (although I don't know if we *have* a compatible
 implementation yet - we certainly feel like we *should* have one).
 
 Ok, so it sounds like ast is *not* limited to CPython?

Oh, definitely not.  I would be pretty dismayed if tools like 
http://bazaar.launchpad.net/~divmod-dev/divmod.org/trunk/files/head:/Pyflakes/
 would not run on Jython  PyPy.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Differences among Emacsen

2011-03-30 Thread Glyph Lefkowitz

On Mar 30, 2011, at 2:54 PM, Barry Warsaw wrote:

 On Mar 30, 2011, at 09:43 AM, Ralf Schmitt wrote:
 
 Barry Warsaw ba...@python.org writes:
 
 In case you missed it, there are now *three* Python modes.  Tim Peters'
 original and best (in my completely unbiased opinion wink) python-mode.el
 which is still being developed, the older but apparently removed from Emacs
 python.el and the 'new' (so I've heard) python.el.
 
 https://github.com/fgallina/python.el is the fourth one..
 
 Wonderful.

I have a plea for posterity: since I'm sure that a hundred people will see this 
post and decide that the best solution to this proliferation of python plugins 
for emacs is that there should be a new one that is even better than all these 
other ones (and also totally incompatible, of course)...

I won't try to stop you all from doing that, but please at least don't call it 
python.el.  This is like if ActiveState, Wing, PyCharm and PyDev for Eclipse 
had all decided to call their respective projects IDLE because that's what 
you call a Python IDE :).  It would be nice to be able to talk about Python / 
Emacs code without having to do an Abbott and Costello routine.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Finally switch urllib.parse to RFC3986 semantics?

2011-03-18 Thread Glyph Lefkowitz

On Mar 18, 2011, at 8:41 PM, Guido van Rossum wrote:

 Really. Do they still call them URIs? :-)

Well, by RFC 398*7* they're calling them IRIs instead.  'irilib', perhaps? ;-)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] funky buildbot

2011-03-10 Thread Glyph Lefkowitz
On Mar 10, 2011, at 3:18 PM, Bill Janssen wrote:

 It's a new Mac Mini running the latest Snow Leopard, with Python 2.6.1
 (the /usr/bin/python) and buildslave 0.8.3, using Twisted 8.2.0.

I realize that Python 2.6 is pretty old too, but a _lot_ of bugfixes have gone 
into Twisted since 8.2.  I'm not 100% sure this is a Twisted issue but you may 
want to try upgrading to 10.2.0 and see if that fixes things.  (I have a dim 
memory of similar issues which were eventually fixed by something in our 
subprocess support...)

-glyph


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Support the /usr/bin/python2 symlink upstream

2011-03-04 Thread Glyph Lefkowitz
On Fri, Mar 4, 2011 at 10:03 AM, Westley Martínez aniko...@gmail.comwrote:

 On Fri, 2011-03-04 at 00:54 -0800, Aaron DeVore wrote:
  On Thu, Mar 3, 2011 at 11:44 PM, Kerrick Staley m...@kerrickstaley.com
 wrote:
   That way, if the sysadmin does decide to replace the installed python
 file, he can do so without inadvertently deleting the previously installed
 binary.
 
  Nit pick: Change he to they to be gender neutral.

 Nit pick: Change they to he to be grammatically correct. If we
 really have to be gender neutral, change he to he or she.


This grammatical rule is a modern fiction with no particular utility.  Go
ahead and use singular they as a gender-neutral pronoun; it was good
enough for Shakespeare, Twain, Austen and Shaw, it should be good enough for
Python.

http://en.wikipedia.org/wiki/Singular_they#Examples_of_generic_they
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Import and unicode: part two

2011-01-20 Thread Glyph Lefkowitz

On Jan 20, 2011, at 11:46 AM, Guido van Rossum wrote:

 On Thu, Jan 20, 2011 at 5:16 AM, Nick Coghlan ncogh...@gmail.com wrote:
 On Thu, Jan 20, 2011 at 10:08 PM, Simon Cross
 hodgestar+python...@gmail.com wrote:
 I'm changing my vote on this to a +1 for two reasons:
 
 * Initially I thought this wasn't supported by Python at all but I see
 that currently it is supported but that support is broken (or at least
 limited to UTF-8 filesystem encodings). Since support is there, might
 as well make it better (especially if it tidies up the code base at
 the same time).
 
 * I still don't think it's a good idea to give modules non-ASCII names
 but the consenting adults approach suggests we should let people
 shoot themselves in the foot if they believe they have good reason to
 do so.
 
 I'm also +1 on this for the reasons Simon gives.
 
 Same here. *Most* code will never be shared, or will only be shared
 between users in the same community. When it goes wrong it's also a
 learning opportunity. :-)

Despite my usual proclivity for being contrarian, I find myself in agreement 
here.  Linux users with locales that don't specify UTF-8 frankly _should_ have 
to deal with all kinds of nastiness until they can transcode their filesystems. 
 MacOS and Windows both have a right answer here and your third-party tools 
shouldn't create mojibake in your filenames.

However, I feel that we should not necessarily be making non-ASCII programmers 
second-class citizens, if they are to be supported at all.  The obvious outcome 
of the current regime is, if you want your code to work in the wider world, you 
have to make everything ASCII, so non-ASCII programmers have to do a huge 
amount of extra work to prepare their stuff for distribution.  As an english 
speaker I'd be happy about that, but as a person with a lot of Chinese in-laws, 
it gives me pause.

There is a difference between sharing code for inspection and editing (where a 
little codec pain is good for the soul: set your locale to UTF-8 and forget it 
already!) and sharing code so that a (non-programming) user can just run it.  
If I can write software in English and distribute it to Chinese people, fair's 
fair, they should be able to write it in chinese and have it work on my 
computer.

To support the latter, could we just make sure that zipimport has a consistent, 
non-locale-or-operating-system-dependent interpretation of encoding?  That way 
a distributed egg would be importable from a zipfile regardless of how screwed 
up the distribution target machine's filesystem is.  (And this is yet more 
motivation for distributors to set zip_safe=True.)___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Import and unicode: part two

2011-01-19 Thread Glyph Lefkowitz

On Jan 20, 2011, at 12:02 AM, Glenn Linderman wrote:

 But for local code, having to think up an ASCII name for a module rather than 
 use the obvious native-language name, is just brain-burden when creating the 
 code.

Is it really?  You already had to type 'import', presumably if you can think in 
Python you can think in ASCII.

(After my experiences with namespace crowding in Twisted, I'm inclined to 
suggest something more like import m_07117FE4A1EBD544965DC19573183DA2 as café 
- then I never need to worry about café2 looking ugly or cafe being 
incompatible :).)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Import and unicode: part two

2011-01-19 Thread Glyph Lefkowitz

On Jan 20, 2011, at 12:19 AM, Glenn Linderman wrote:

 Now if the stuff after m_ was the hex UTF-8 of  café, that could get 
 interesting :)

(As it happens, it's the hex digest of the MD5 of the UTF-8 of café... ;-))___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] devguide: Point out that OS X users need to change examples to use python.exe instead of

2011-01-10 Thread Glyph Lefkowitz

On Jan 10, 2011, at 1:37 PM, Łukasz Langa wrote:

 I'm using the case-sensitive variant of HFS+ since 10.4. It works, I like it 
 and you get ./python with it.

I realize that this isn't a popularity contest for this feature, but I feel 
like I should pipe up here and mention that it breaks some applications - for 
example, you can't really install World of Warcraft on a case-insensitive 
filesystem.  Not the filesystem's fault really, but it is a good argument for 
why users shouldn't choose it.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Checking input range in time.asctime and time.ctime

2011-01-05 Thread Glyph Lefkowitz

On Jan 5, 2011, at 4:33 PM, Guido van Rossum wrote:

 Shouldn't the logic be to take the current year into account? By the
 time 2070 comes around, I'd expect 70 to refer to 2070, not to 1970.
 In fact, I'd expect it to refer to 2070 long before 2070 comes around.
 
 All of which makes me think that this is better left to the app, which
 can decide for itself whether it is more important to represent dates
 in the future or dates in the past.

The point of this somewhat silly flag (as I understood its description earlier 
in the thread) is to provide compatibility with POSIX 2-year dates.  As per 
http://pubs.opengroup.org/onlinepubs/007908799/xsh/strptime.html - 

%y
is the year within century. When a century is not otherwise specified, values 
in the range 69-99 refer to years in the twentieth century (1969 to 1999 
inclusive); values in the range 00-68 refer to years in the twenty-first 
century (2000 to 2068 inclusive). Leading zeros are permitted but not required.

So, 70 means 1970, forever, in programs that care about this nonsense.

Personally, by the time 2070 comes around, I hope that 70 will just refer to 
70 A.D., and get you odd looks if you use it in a written date - you might as 
well just write '0' :).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Possible optimization for LOAD_FAST ?

2011-01-03 Thread Glyph Lefkowitz

On Jan 2, 2011, at 10:18 PM, Guido van Rossum wrote:

 On Sun, Jan 2, 2011 at 5:50 PM, Alex Gaynor alex.gay...@gmail.com wrote:
 No, it's singularly impossible to prove that any global load will be any 
 given
 value at compile time.  Any optimization based on this premise is wrong.
 
 True.
 
 My proposed way out of this conundrum has been to change the language
 semantics slightly so that global names which (a) coincide with a
 builtin, and (b) have no explicit assignment to them in the current
 module, would be fair game for such optimizations, with the
 understanding that the presence of e.g. len = len anywhere in the
 module (even in dead code!) would be sufficient to disable the
 optimization.
 
 But barring someone interested in implementing something based on this
 rule, the proposal has languished for many years.

Wouldn't this optimization break things like mocking out 'open' for testing via 
'module.open = fakeopen'?  I confess I haven't ever wanted to change 'len' but 
that one seems pretty useful.

If CPython wants such optimizations, it should do what PyPy and its ilk do, 
which is to notice the assignment, but recompile code in that module to disable 
the fast path at runtime, preserving the existing semantics.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-25 Thread Glyph Lefkowitz
On Nov 24, 2010, at 4:03 AM, Stephen J. Turnbull wrote:

 You end up proliferating types that all do the same kind of thing.  Judicious 
 use of inheritance helps, but getting the fundamental abstraction right is 
 hard.  Or least, Emacs hasn't found it in 20 years of trying.

Emacs hasn't even figured out how to do general purpose iteration in 20 years 
of trying either.  The easiest way I've found to loop across an arbitrary pile 
of 'stuff' is the CL 'loop' macro, which you're not even supposed to use.  Even 
then, you still have to make the arcane and pointless distinction of using 
'across' or 'in' or 'on'.  Python, on the other hand, has iteration pretty well 
tied up nicely in a bow.

I don't know how to respond to the rest of your argument.  Nothing you've said 
has in any way indicated to me why having code-point offsets is a good idea, 
only that people who know C and elisp would rather sling around piles of 
integers than have good abstract types.

For example:

 I think it more likely that markers are very expense to create and use 
 compared to integers.

What?  When you do 'for x in str' in python, you are already creating an 
iterator object, which has to store the exact same amount of state that our 
proposed 'marker' or 'character pointer' would have to store.  The proposed 
UTF-8 marker would have to do a tiny bit more work when iterating because it 
would have to combine multibyte characters, but in exchange for that you get to 
skip a whole ton of copying when encoding and decoding.  How is this expensive 
to create and use?  For every application I have ever designed, encountered, or 
can even conjecture about, this would be cheaper.  (Assuming not just a UTF-8 
string type, but one for UTF-16 as well, where native data is in that format 
already.)

For what it's worth, not wanting to use abstract types in Emacs makes sense to 
me: I've written my share of elisp code, and it is hard to create reasonable 
abstractions in Emacs, because the facilities for defining types and creating 
polymorphic logic are so crude.  It's a lot easier to just assume your 
underlying storage is an array, because at the end of the day you're going to 
need to call some functions on it which care whether it's an array or an alist 
or a list or a vector anyway, so you might as well just say so up front.  But 
in Python we could just call 'mystring.by_character()' or 
'mystring.by_codepoint()' and get an iterator object back and forget about all 
that junk.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-25 Thread Glyph Lefkowitz
On Nov 24, 2010, at 10:55 PM, Stephen J. Turnbull wrote:

 Greg Ewing writes:
 On 24/11/10 22:03, Stephen J. Turnbull wrote:
 But
 if you actually need to remember positions, or regions, to jump to
 later or to communicate to other code that manipulates them, doing
 this stuff the straightforward way (just copying the whole iterator
 object to hang on to its state) becomes expensive.
 
 If the internal representation of a text pointer (I won't call it
 an iterator because that means something else in Python) is a byte
 offset or something similar, it shouldn't take up any more space
 than a Python int, which is what you'd be using anyway if you
 represented text positions by grapheme indexes or whatever.
 
 That's not necessarily true.  Eg, in Emacs (there you go again),
 Lisp integers are not only immediate (saving one pointer), but the
 type is encoded in the lower bits, so that there is no need for a type
 pointer -- the representation is smaller than the opaque marker type.
 Altogether, up to 8 of 12 bytes saved on a 32-bit platform, or 16 of
 24 bytes on a 64-bit platform.

Yes, yes, lisp is very clever.  Maybe some other runtime, like PyPy, could make 
this optimization.  But I don't think that anyone is filling up main memory 
with gigantic piles of character indexes and need to squeeze out that extra 
couple of bytes of memory on such a tiny object.  Plus, this would allow such a 
user to stop copying the character data itself just to decode it, and on 
mostly-ascii UTF-8 text (a common use-case) this is a 2x savings right off the 
bat.

 In Python it's true that markers can use the same data structure as
 integers and simply provide different methods, and it's arguable that
 Python's design is better.  But if you use bytes internally, then you
 have problems.

No, you just have design questions.

 Do you expose that byte value to the user?

Yes, but only if they ask for it.  It's useful for computing things like quota 
and the like.

 Can users (programmers using the language and end users) specify positions in 
 terms of byte values?

Sure, why not?

 If so, what do you do if the user specifies a byte value that points into a 
 multibyte character?

Go to the beginning of the multibyte character.  Report that position; if the 
user then asks the requested marker object for its position, it will report 
that byte offset, not the originally-requested one.  (Obviously, do the same 
thing for surrogate pair code points.)

 What if the user wants to specify position by number of characters?

Part of the point that we are trying to make here is that nobody really cares 
about that use-case.  In order to know anything useful about a position in a 
text, you have to have traversed to that location in the text. You can remember 
interesting things like the offsets of starts of lines, or the x/y positions of 
characters.

 Can you translate efficiently?

No, because there's no point :).  But you _could_ implement an overlay that 
cached things like the beginning of lines, or the x/y positions of interesting 
characters.

 As I say elsewhere, it's possible that there really never is a need to 
 efficiently specify an absolute position in a large text as a character 
 (grapheme, whatever) count.

 But I think it would be hard to implement an efficient text-processing 
 *language*, eg, a Python module
 for *full conformance* in handling Unicode, on top of UTF-8.

Still: why?  I guess if I have some free time I'll try my hand at it, and maybe 
I'll run into a wall and realize you're right :).

 Any time you have an algorithm that requires efficient access to arbitrary 
 text positions, you'll spend all your skull sweat fighting the 
 representation.  At least, that's been my experience with Emacsen.

What sort of algorithm would that be, though?  The main thing that I could 
think of is a text editor trying to efficiently allow the user to scroll to the 
middle of a large file without reading the whole thing into memory.  But, in 
that case, you could use byte-positions to estimate, and display an heuristic 
number while calculating the real line numbers.  (This is what 'less' does, and 
it seems to work well.)

 So I don't really see what you're arguing for here. How do
 *you* think positions in unicode strings should be represented?
 
 I think what users should see is character positions, and they should
 be able to specify them numerically as well as via an opaque marker
 object.  I don't care whether that position is represented as bytes or
 characters internally, except that the experience of Emacsen is that
 representation as byte positions is both inefficient and fragile.  The
 representation as character positions is more robust but slightly more
 inefficient.

Is it really the representation as byte positions which is fragile (i.e. the 
internal implementation detail), or the exposure of that position to calling 
code, and the idiomatic usage of that number as an integer?


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Glyph Lefkowitz

On Nov 23, 2010, at 10:37 AM, ben.cottr...@nominum.com wrote:

 I'd prefer not to think of the number of times I've made the following 
 mistake:
 
 s = socket.socket(socket.SOCK_DGRAM, socket.AF_INET)

If it's any consolation, it's fewer than the number of times I have :).

(More fun, actually, is where you pass a file descriptor to the wrong argument 
of 'fromfd'...)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Glyph Lefkowitz

On Nov 23, 2010, at 10:01 AM, Antoine Pitrou wrote:

 Well, it is easy to assign range(N) to a tuple of names when desired. I
 don't think an automatically-enumerating constant generator is needed.

I don't think that numerical enumerations are the only kind of constants we're 
talking about.  Others have already mentioned strings.  Also, see 
http://tm.tl/4671 for some other use-cases.  Since this isn't coming to 2.x, 
we're probably going to do our own thing anyway (unless it turns out that 
flufl.enum is so great that we want to add another dependency...) but I'm 
hoping that the outcome of this discussion will point to something we can be 
compatible with.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Glyph Lefkowitz
On Nov 23, 2010, at 7:22 PM, James Y Knight wrote:

 On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote:
 Maybe Python should have used UTF-8 as its internal unicode
 representation. Then people who were foolish enough to assume
 one character per string item would have their programs break
 rather soon under only light unicode testing. :-)
 
 You put a smiley, but, in all seriousness, I think that's actually the right 
 thing to do if anyone writes a new programming language. It is clearly the 
 right thing if you don't have to be concerned with backwards-compatibility: 
 nobody really needs to be able to access the Nth codepoint in a string in 
 constant time, so there's not really any point in storing a vector of 
 codepoints.
 
 Instead, provide bidirectional iterators which can traverse the string by 
 byte, codepoint, or by grapheme (that is: the set of combining characters + 
 base character that go together, making up one thing which a human would 
 think of as a character).


I really hope that this idea is not just for new programming languages.  If you 
switch from doing unicode wrong to doing unicode right in Python, you 
quadruple the memory footprint of programs which primarily store and manipulate 
large amounts of text.

This is especially ridiculous in PyGTK applications, where the GUI's internal 
representation required by the GUI UTF-8 anyway, so the round-tripping of 
string data back and forth to the exploded UTF-32 representation is wasting 
gobs of memory and time.  It at least makes sense when your C library's idea 
about character width and your Python build match up.

But, in a desktop app this is unlikely to be a performance concern; in servers, 
it's a big deal; measurably so.  I am pretty sure that in the server apps that 
I work on, we are eventually going to need our own string type and UTF-8 logic 
that does exactly what James suggested - certainly if we ever hope to support 
Py3.

(I dimly recall that both James and I have made this point before, but it's 
pretty important, so it bears repeating.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a)

2010-11-23 Thread Glyph Lefkowitz
On Nov 23, 2010, at 9:02 AM, Antoine Pitrou wrote:

 On Tue, 23 Nov 2010 00:07:09 -0500
 Glyph Lefkowitz gl...@twistedmatrix.com wrote:
 On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto 
 ocean-c...@m2.ccsnet.ne.jp wrote:
 
 Hello. Does this affect python? Thank you.
 
 http://www.openssl.org/news/secadv_20101116.txt
 
 
 No.
 
 Well, actually it does, but Python links against the system OpenSSL on
 most platforms (except Windows), so it's up to the OS vendor to apply
 the patch.


It does?  If so, I must have misunderstood the vulnerability.  Can you explain 
how it affects Python?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Glyph Lefkowitz
On Nov 23, 2010, at 9:44 PM, Stephen J. Turnbull wrote:

 James Y Knight writes:
 
 You put a smiley, but, in all seriousness, I think that's actually
 the right thing to do if anyone writes a new programming
 language. It is clearly the right thing if you don't have to be
 concerned with backwards-compatibility: nobody really needs to be
 able to access the Nth codepoint in a string in constant time, so
 there's not really any point in storing a vector of codepoints.
 
 A sad commentary on the state of Emacs usage, nobody.
 
 The theory is that accessing the first character of a region in a
 string often occurs as a primitive operation in O(N) or worse
 algorithms, sometimes without enough locality at the collection of
 regions level to give a reasonably small average access time.

I'm not sure what you mean by the theory is.  Whose theory?  About what?

 In practice, any *Emacs user can tell you that yes, we do need to be
 able to access the Nth codepoint in a buffer in constant time.  The
 O(N) behavior of current Emacs implementations means that people often
 use a binary coding system on large files.  Yes, some position caching
 is done, but if you have a large file (eg, a mail file) which is
 virtually segmented using pointers to regions, locality gets lost.
 (This is not a design bug, this is a fundamental requirement: consider
 fast switching between threaded view and author-sorted view.)

Sounds like a design bug to me.  Personally, I'd implement fast switching 
between threaded view and author-sorted view the same way I'd address any 
other multiple-views-on-the-same-data problem.  I'd retain data structures for 
both, and update them as the underlying model changed.

These representations may need to maintain cursors into the underlying 
character data, if they must retain giant wads of character data as an 
underlying representation (arguably the _main_ design bug in Emacs, that it 
encourages you to do that for everything, rather than imposing a sensible 
structure), but those cursors don't need to be code-point counters; they could 
be byte offsets, or opaque handles whose precise meaning varied with the 
potentially variable underlying storage.

Also, please remember that Emacs couldn't be implemented with giant Python 
strings anyway: crucially, all of this stuff is _mutable_ in Emacs.

 And of course an operation that sorts regions in a buffer using
 character pointers will have the same problem.  Working with memory
 pointers, OTOH, sucks more than that; GNU Emacs recently bit the
 bullet and got rid of their higher-level memory-oriented APIs, all of
 the Lisp structures now work with pointers, and only the very
 low-level structures know about character-to-memory pointer
 translation.
 
 This performance issue is perceptible even on 3GHz machines with not
 so large (50MB) mbox files.  It's *horrid* if you do something like
 occur on a 1GB log file, then try randomly jumping to detected log
 entries.

Case in point: occur needs to scan the buffer anyway; you can't do better 
than linear time there.  So you're going to iterate through the buffer, using 
one of the techniques that James proposed, and remember some locations.  Why 
not just have those locations be opaque cursors into your data?

In summary: you're right, in that James missed a spot.  You need bidirectional, 
*copyable* iterators that can traverse the string by byte, codepoint, grapheme, 
or decomposed glyph.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a)

2010-11-22 Thread Glyph Lefkowitz
On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto 
ocean-c...@m2.ccsnet.ne.jp wrote:

 Hello. Does this affect python? Thank you.

 http://www.openssl.org/news/secadv_20101116.txt


No.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   3   4   >