from:"Glyph Lefkowitz"

Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-09-02 Thread Glyph Lefkowitz

On Aug 29, 2014, at 7:44 PM, Alex Gaynor alex.gay...@gmail.com wrote:
  Disabling verification entirely externally to the program, through a CLI flag
  or environment variable. I'm pretty down on this idea, the problem you hit is
  that it's a pretty blunt instrument to swing, and it's almost impossible to
  imagine it not hitting things it shouldn't; it's far too likely to be used in
  applications that make two sets of outbound connections: 1) to some internal
  service which you want to disable verification on, and 2) some external
  service which needs strong validation. A global flag causes the latter to
  fail silently when subjected to a MITM attack, and that's exactly what we're
  trying to avoid. It also makes things much harder for library authors: I
  write an API client for some API, and make TLS connections to it. I want
  those to be verified by default. I can't even rely on the httplib defaults,
  because someone might disable them from the outside.


I would strongly recommend against such a mechanism.

For what it's worth, Twisted simply unconditionally started verifying 
certificates in 14.0 with no disable switch, and (to my knowledge) literally 
no users have complained.

Twisted has a very, very strict backwards compatibility policy.  For example, I 
once refused to accept the deletion of a class that raised an exception upon 
construction, on the grounds that someone might have been inadvertently 
importing that class, and they shouldn't see an exception until they've seen a 
deprecation for one release.

Despite that, we classified failing to verify certificates as a security bug, 
and fixed it with no deprecation period.  When users type the 's' after the 'p' 
and before the ':' in a URL, they implicitly expect browser-like certificate 
verification.

The lack of complaints is despite the fact that 14.0 has been out for several 
months now, and, thanks to the aforementioned strict policy, users tend to 
upgrade fairly often (since they know they can almost always do so without fear 
of application-breaking consequences).  According to PyPI metadata, 14.0.0 has 
had 273283 downloads so far.

Furthermore, disable verification is a nonsensical thing to do with TLS.  
select a trust root is a valid configuration option, and OpenSSL already 
provides it via the SSL_CERT_DIR environment variable, so there's no need for 
Python to provide anything beyond that.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-09-02 Thread Glyph Lefkowitz


On Sep 2, 2014, at 4:01 PM, Nick Coghlan ncogh...@gmail.com wrote:

 
 On 3 Sep 2014 08:18, Alex Gaynor alex.gay...@gmail.com wrote:
 
  Antoine Pitrou solipsis at pitrou.net writes:
 
  
   And how many people are using Twisted as an HTTPS client?
   (compared to e.g. Python's httplib, and all the third-party libraries
   building on it?)
  
 
  I don't think anyone could give an honest estimate of these counts, however
  there's two factors to bare in mind: a) It's extremely strongly recommended 
  to
  use requests to make any HTTP requests precisely because httplib is 
  negligent
  in certificate and hostname checking by default, b) We're talking about
  Python3, which has fewer users than Python2.
 
 Creating *new* incompatibilities between Python 2  Python 3 is a major point 
 of concern. One key focus of 3.5 is *reducing* barriers to migration, and 
 this PEP would be raising a new one.
 
No.  Providing the security that the user originally asked for is not a 
backwards incompatible change.  It is a bug fix.  And believe me: I care a 
_LOT_ about reducing barriers to migration.  This would not be on my list of 
the top 1000 things that make migration difficult.
 It's a change worth making, but we have time to ensure there are easy ways to 
 do things like skipping cert validation, or tolerate expired certificates.
 

The API already supports both of these things.  What I believe you're 
implicitly saying is that there needs to be a way to do this without editing 
code, and... no, there really doesn't.  Not to mention the fact that you could 
already craft a horrific monkeypatch to allow operators to cause the ssl module 
to malfunction by 'pip install'ing a separate package, which is about as 
supported as this should be.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-09-02 Thread Glyph Lefkowitz


On Sep 2, 2014, at 4:28 PM, Nick Coghlan ncogh...@gmail.com wrote:

 On 3 Sep 2014 09:08, David Reid dr...@dreid.org wrote:
 
  Nick Coghlan ncoghlan at gmail.com writes:
 
   Creating *new* incompatibilities between Python 2  Python 3 is a major 
   point
   of concern.
 
  Clearly this change should be backported to Python2.
 
 Proposing to break backwards compatibility in a maintenance release (...)
 

As we keep saying, this is not a break in backwards compatibility, it's a bug 
fix.  Yes, systems might break, but that breakage represents an increase in 
security which may well be operationally important.  Not everyone with a 
working application has the relevant understanding an expertise to know that 
Python's HTTP client is exposing them to surveillance.  These applications 
should break. That is the very nature of the fix.  It is not a compatibility 
break that the system starts correctly rejecting invalid connections.

By way of analogy, here's another kind of breach in security: an arbitrary 
remote code execution vulnerability in XML-RPC.  I think we all agree that any 
0day RCE vulnerabilities in Python really ought to be fixed and could be 
legitimately included without worrying about backwards compatibility breaks.  
(At least... gosh, I hope so.)

Perhaps this arbitrary remote execution looks harmless; the use of an eval() 
instead of an int() someplace.  Perhaps someone discovered that they can do 3 
+ 4 in their XML-RPC and the server does the computation for them.  Great!  
They start relying on this in their applications to use symbolic values in 
their requests instead of having explicit enumerations.  This can save you 
quite a bit of code!

When the RCE is fixed, this application will break, and that's fine.  In fact 
that's the whole point of issuing the fix, that people will no longer be able 
to make arbitrary computation requests of your server any more.  If that 
server's maintainer has the relevant context and actually wants the XML-RPC 
endpoint to enable arbitrary RCE, they can easily modify their application to 
start doing eval() on the data that they received, just as someone can easily 
modify their application to intentionally disable all connection security.  
(Let's stop calling it certificate verification because that sounds like some 
kind of clerical detail: if you disable certificate verification, TLS 
connections are unauthenticated and unidentified and therefore insecure.)

For what it's worth, on the equivalent Twisted change, I originally had just 
these concerns, but my mind was changed when I considered what exactly the 
user-interface ramifications were for people typing that 's' for 'secure' in 
URLs.  I was convinced, and we made the change, and there have been no ill 
effects that I'm aware of as a result.  In fact, there has been a renewed 
interest in Twisted for HTTP client work, because we finally made security work 
more or less like it's supposed to, and the standard library is so broken.

I care about the health of the broader Python community, so I will passionately 
argue that this change should be made, but for me personally it's a lot easier 
to justify that everyone should use Twisted (at least since 14+) because 
transport security in the stdlib is such a wreck and even if it gets fixed it's 
going to have easy options to turn it off unilaterally so your application can 
never really be sure if it's getting transport security when it's requesting 
transport security.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Language Summit Follow-Up

2014-05-28 Thread Glyph Lefkowitz

At the language summit, Alex and I volunteered to put together some 
recommendations on what changes could be made to Python (the language) in order 
to facilitate a smoother transition from Python 2 to Python 3.  One of the 
things that motivated this was the (surprising, to us) consideration that 
features like ensurepip might be added to the future versions of the 2.7 
installers from python.org.

The specific motivations for writing this are:

Library maintainers have a rapidly expanding matrix that requires an increasing 
number of branches to satisfy.
People with large corporate codebases absolutely cannot port all at once.

If you don't have perfect test coverage then you can't make any progress.  So 
these changes are intended to make porting from python 2 to python 3 more 
guided and incremental.  We believe that these attributes are necessary.

We would like to stress that we don't believe anything on this list is as 
important as the continuing efforts that everyone in the broader ecosystem is 
making.  If you just want to ease the transition by working on anything at all, 
the best use of your time right now is porting 
https://warehouse.python.org/project/MySQL-python/ to Python 3. :)

Nevertheless there are some things that the language and CPython could do.

Unfortunately we had to reject any proposal that involved new __future__ 
imports, since unknown __future__ imports are un-catchable SyntaxErrors.

Here are some ideas for Python 2.7+.

Add ensurepip to the installers.  Having pip reliably available increases the 
availability of libraries that help with porting, and will generally strengthen 
the broader ecosystem in the (increasingly long) transition period.
Add some warnings about python 3 compatibility.
It should at least be possible to get a warning for every single implicit 
string coercion.
Old-style classes.
Old-style division.
Print statements.
Old-style exception syntax.
buffer().
bytes(memoryview(b'abc'))
Importing old locations from the stdlib (see point 4.)
Long integer syntax.
Use of variables beyond the lifetime of an 'except Exception as e' block or a 
list comprehension.
Backport 'yield from' to allow people to use Tulip and Tulip-compatible code, 
and to facilitate the development of Tulip-friendly libraries and a Tulip 
ecosystem.  A robust Tulip ecosystem requires the participation of people who 
are not yet using Python 3.
Add aliases for the renamed modules in the stdlib.  This will allow people to 
just write python 3 in a lot more circumstances.
(re-)Enable warnings by default, including enabling -3 warnings.  Right now all 
warnings are silent by default, which greatly reduces discoverability of future 
compatibility issues.  I hope it's not controversial to say that most new 
Python code is still being written against Python 2.7 today; if people are 
writing that code in such a way that it's not 3-friendly, it should be a more 
immediately noticeable issue.
Get rid of 2to3. Particularly, of any discussion of using 2to3 in the 
documentation.  More than one very experienced, well-known Python developer in 
this discussion has told me that they thought 2to3 was the blessed way to port 
their code, and it's no surprise that they think so, given that the first 
technique https://docs.python.org/3/howto/pyporting.html mentions is still 
2to3.  We should replace 2to3 with something like 
https://github.com/mitsuhiko/python-modernize. 2to3 breaks your code on 
python 2, and doesn't necessarily get it running on python 3.  A more 
conservative approach that reduced the amount of work to get your code 2/3 
compatible but was careful to leave everything working would be a lot more 
effective.
Add a new 'bytes' type that actually behaves like the Python 3 bytes type 
(bytes(5)).

We have rejected any changes for Python 3.5, simply because of the extremely 
long time to get those features into users hands.  Any changes for Python 3 
that we're proposing would need to get into a 3.4.x release, so that, for 
example, they can make their way into Ubuntu 14.04 LTS.

Here are some ideas for Python 3.4.x:

Usage of Python2 style syntax (for example, a print statement) or stdlib module 
names (for example, 'import urllib2') should result in a specific, informative 
warning, not a generic SyntaxError/ImportError.  This will really help new 
users.
Add 'unicode' back as an alias for 'str'.  Just today I was writing some 
documentation where I had to resort to some awkward encoding tricks just to get 
a bytes object out without explaining the whole 2/3 dichotomy in some unrelated 
prose.

We'd like to thank all the individuals who gave input and feedback in creating 
this list.

-glyph  Alex Gaynor

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 418 is too divisive and confusing and should be postponed

2012-04-07 Thread Glyph Lefkowitz

On Apr 7, 2012, at 3:40 AM, Steven D'Aprano wrote:

 In any case, NTP is not the only thing that adjusts the clock, e.g. the 
 operating system will adjust the time for daylight savings.

Daylight savings time is not a clock adjustment, at least not in the sense this 
thread has mostly been talking about the word clock.  It doesn't affect the 
seconds from epoch measurement, it affects the way in which the clock is 
formatted to the user.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] this is why we shouldn't call it a monotonic clock (was: PEP 418 is too divisive and confusing and should be postponed)

2012-04-06 Thread Glyph Lefkowitz


On Apr 5, 2012, at 8:07 PM, Zooko Wilcox-O'Hearn wrote:

 On Thu, Apr 5, 2012 at 7:14 PM, Greg Ewing greg.ew...@canterbury.ac.nz 
 wrote:
 
 This is the strict mathematical meaning of the word monotonic, but the way 
 it's used in relation to OS clocks, it seems to mean rather more than that.
 
 Yep. As far as I can tell, nobody has a use for an unsteady, monotonic clock.
 
 There seem to be two groups of people:
 
 1. Those who think that monotonic clock means a clock that never
 goes backwards. These people are in the majority. After all, that's
 what the word monotonic means ¹ . However, a clock which guarantees
 *only* this is useless.

While this is a popular view on this list and in this discussion, it is also a 
view that seems to contradict quite a lot that has been written on the subject, 
and seems contrary to the usual jargon when referring to clocks.

 2. Those who think that monotonic clock means a clock that never
 jumps, and that runs at a rate approximating the rate of real time.
 This is a very useful kind of clock to have! It is what C++ now calls
 a steady clock. It is what all the major operating systems provide.

All clocks run at a rate approximating the rate of real time.  That is very 
close to the definition of the word clock in this context.  All clocks have 
flaws in that approximation, and really those flaws are the whole point of 
access to distinct clock APIs.  Different applications can cope with different 
flaws.

There seems to be a persistent desire in this discussion to specify and define 
these flaws out of existence, where this API really should instead be embracing 
the flaws and classifying them.  (Victor is doing a truly amazing job with the 
PEP in that regard; it's already the first web search hit on every search 
engine I've tried for more than half of these terms.)

Steadiness, in the C++ sense, only applies to most OS clocks that are given the 
label of monotonic during the run of a single program on a single computer 
while that computer is running at some close approximation of full power.

As soon as you close your laptop lid, the property of steadiness with respect 
to real local time goes away; the clock stops ticking forward, and only resumes 
when the lid is opened again.  The thing I'd like to draw attention to here is 
that when you get one of these clocks, you *do not* get a parallel facility 
that allows you to identify whether a suspend has happened (or, for that 
matter, when the wall clock has stepped).  Or at least, nobody's proposed one 
for Python.  I proposed one for Twisted, 
http://twistedmatrix.com/trac/ticket/2424#comment:26, but you need an event 
loop for that, because you need to be able to register interest in that event.

I believe that the fact that these clocks are only semi-steady, or only steady 
with respect to certain kinds of time, is why the term monotonic clock 
remains so popular, despite the fact that mathematical monotonicity is not 
actually their most useful property.  While these OS-provided clocks have other 
useful properties, they only have those properties under specific conditions 
which you cannot necessarily detect and you definitely cannot enforce.  But 
they all remain monotonic in the mathematical sense (modulo hardware and OS 
bugs), so it is the term monotonic which comes to label all their other, more 
useful, but less reliable properties.

 The people in class 1 are more correct, technically, and far more
 numerous, but the concept from 1 is a useless concept that should be
 forgotten.

Technically correct; the best kind of correct!

The people in class 1 are only more correct if you accept that mis-applying 
jargon from one field (mathematics) to replace generally-accepted terminology 
in another field (software clocks) is the right thing to do.  I think it's 
better to learn the local jargon and try to apply it consistently.  If you 
search around the web for the phrase monotonic clock, it's applied in a sense 
closest to the one you mean on thousands and thousands of web pages.  steady 
clock generally applies with reference to C++, and even then is often found in 
phrases like is_steady indicates whether this clock is a monotonic clock.

Software developers mis-apply mathematical terms like isomorphic, 
orthogonal, incidental, tangential, and reflexive all the time.  
Physicists and mathematicians also disagree on the subtleties of the same 
terms.  Context is everything.

 So before proceeding, we should mutually agree that we have no
 interest in implementing a clock of type 1. It wouldn't serve anyone's
 use case (correct me if I'm wrong!) and the major operating systems
 don't offer such a thing anyway.

+1.

 Then, if we all agree to stop thinking about that first concept, then
 we need to agree whether we're all going to use the word monotonic
 clock to refer to the second concept, or if we're going to use a
 different word (such as steady clock) to refer to the second
 concept. I would prefer

Re: [Python-Dev] Use QueryPerformanceCounter() for time.monotonic() and/or time.highres()?

2012-04-02 Thread Glyph Lefkowitz


On Apr 2, 2012, at 10:39 AM, Kristján Valur Jónsson wrote:

 no steps is something unquantifiable.  All time has steps in it.

No steps means something very specific when referring to time APIs.  As I 
recently explained here: 
http://article.gmane.org/gmane.comp.python.devel/131487/.

-glyph


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Playing with a new theme for the docs

2012-03-22 Thread Glyph Lefkowitz

On Mar 21, 2012, at 6:28 PM, Greg Ewing wrote:

 Ned Batchelder wrote:
 Any of the tweaks people are suggesting could be applied individually using 
 this technique.  We could just as easily choose to make the site 
 left-justified, and let the full-justification fans use custom stylesheets 
 to get it.
 
 Is it really necessary for the site to specify the justification
 at all? Why not leave it to the browser and whatever customisation
 the user chooses to make?

It's design.  It's complicated.

Maybe yes, if you look at research related to default usage patterns, and 
saccade distance, reading speed and retention latency.

Maybe no, if you look at research related to fixation/focus time, eye strain, 
and non-linear access patterns.

Maybe maybe, if you look at the subjective aesthetic of the page according to 
various criteria, like does it look like a newspaper and do I have to resize 
my browser every time I visit a new site to get a decent width for reading.

As has been said previously in this thread several times, it's best to leave 
this up to a design czar who will at least make some decisions who will make 
some people happy.  I'm fairly certain it's not possible to create a design 
that's optimal for all readers in all cases.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issue 13524: subprocess on Windows

2012-03-22 Thread Glyph Lefkowitz

On Mar 21, 2012, at 4:38 PM, Brad Allen wrote:

 I tripped over this one trying to make one of our Python at work
 Windows compatible. We had no idea that a magic 'SystemRoot'
 environment variable would be required, and it was causing issues for
 pyzmq.
 
 It might be nice to reflect the findings of this email thread on the
 subprocess documentation page:
 
 http://docs.python.org/library/subprocess.html
 
 Currently the docs mention this:
 
 Note If specified, env must provide any variables required for the
 program to execute. On Windows, in order to run a side-by-side
 assembly the specified env must include a valid SystemRoot.
 
 How about rewording that to:
 
 Note If specified, env must provide any variables required for the
 program to execute. On Windows, a valid SystemRoot environment
 variable is required for some Python libraries such as the 'random'
 module. Also, in order to run a side-by-side assembly the specified
 env must include a valid SystemRoot.

Also, in order to execute in any installation environment where libraries are 
found in non-default locations, you will need to set LD_LIBRARY_PATH.  Oh, and 
you will also need to set $PATH on UNIX so that libraries can find their helper 
programs and %PATH% on Windows so that any compiled dynamically-loadable 
modules and/or DLLs can be loaded.  And by the way you will also need to relay 
DYLD_LIBRARY_PATH if you did a UNIX-style build on OS X, not LD_LIBRARY_PATH.  
Don't forget that you probably also need PYTHONPATH to make sure any subprocess 
environments can import the same modules as their parent.  Not to mention 
SSH_AUTH_SOCK if your application requires access to _remote_ process spawning, 
rather than just local.  Oh and DISPLAY in case your subprocesses need GUI 
support from an X11 program (which sometimes you need just to initialize 
certain libraries which don't actually do anything with a GUI).  Oh and 
__CF_USER_TEXT_ENCODING is important sometimes too, don't forget that.  And 
 if your subprocess is in Perl or Ruby or Java you may need a couple dozen 
other variables which your deployment environment has set for you too.  Did I 
mention CFLAGS or LC_ALL yet?  Let me tell you a story about this one HP/UX 
machine...

Ahem.

Bottom line: it seems like screwing with the process spawning environment to 
make it minimal is a good idea for simplicity, for security, and for 
modularity.  But take it from me, it isn't.  I guarantee you that you don't 
actually know what is in your operating system's environment, and initializing 
it is a complicated many-step dance which some vendor or sysadmin or product 
integrator figured out how to do much better than your hapless Python program 
can.

%SystemRoot% is just the tip of a very big, very nasty iceberg.  Better not to 
keep refining why exactly it's required, or someone will eventually be adding a 
new variable (starting with %APPDATA% and %HOMEPATH%) that can magically cause 
your subprocess not to spawn properly to this page every six months for 
eternity.  If you're spawning processes as a regular user, you should just take 
the environment you're given, perhaps with a few specific light additions whose 
meaning you understand.  If you're spawning a process as an administrator or 
root, you should probably initialize the environment for the user you want to 
spawn that process as using an OS-specific mechanism like login(1).  (Sorry 
that I don't know the Windows equivalent.)

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] sharing sockets among processes on windows

2012-03-14 Thread Glyph Lefkowitz


On Mar 13, 2012, at 5:27 PM, Kristján Valur Jónsson wrote:

 Hi,
 I´m interested in contributing a patch to duplicate sockets between processes 
 on windows.
 Tha api to do this is WSADuplicateSocket/WSASocket(), as already used by 
 dup() in the _socketmodule.c
 Here´s what I have:

Just in case anyone is interested, we also have a ticket for this in Twisted: 
http://twistedmatrix.com/trac/ticket/4389. It would be great to share code as 
much as possible.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations, continued, continued again...

2012-02-01 Thread Glyph Lefkowitz

On Feb 1, 2012, at 12:46 PM, Guido van Rossum wrote:

 I understand that you're hesitant to just dump your current mess, and
 you want to clean it up before you show it to us. That's fine. (...) And 
 remember, it doesn't need to be
 perfect (in fact perfectionism is probably a bad idea here).

Just as a general point of advice to open source contributors, I'd suggest 
erring on the side of the latter rather than the former suggestion here: dump 
your current mess, along with the relevant caveats (it's a mess, much of it is 
irrelevant) so that other developers can help you clean it up, rather than 
putting the entire burden of the cleanup on yourself.  Experience has taught me 
that most people who hold back work because it needs cleanup eventually run out 
of steam and their work never gets integrated and maintained.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Packaging and setuptools compatibility

2012-01-24 Thread Glyph Lefkowitz

On Jan 24, 2012, at 12:54 PM, Alexis Métaireau wrote:

 I'm wondering if we should support that (a way to have plugins) in the new 
 packaging thing, or not. If not, this mean we should come with another 
 solution to support this outside of packaging (may be in distribute). If yes, 
 then we should design it, and probably make it a sub-part of packaging.

First, my interest: Twisted has its own plugin system.  I would like this to 
continue to work in the future.

I do not believe that packaging should support plugins directly.  Run-time 
metadata is not the packaging system's job.  However, the packaging system does 
need to provide some guarantees about how to install and update data at 
installation (and post-installation time) so that databases of plugin metadata 
may be kept up to date.  Basically, packaging's job is constructing explicitly 
declared parallels between your development environment and your deployment 
environment.

Some such databases are outside of Python entirely (for example, you might 
think of /etc/init.d as such a database), so even if you don't care about the 
future of Twisted's weirdo plugin system, it would be nice for this to be 
supported.

In other words, packaging should have a meta-plugin system: a way for a plugin 
system to register itself and provide an API for things to install their 
metadata, and a way to query the packaging module about the way that a Python 
package is installed so that it can put things near to it in an appropriate 
way.  (Keep in mind that near to it may mean in a filesystem directory, or a 
zip file, or stuffed inside a bundle or executable.)

In my design of Twisted's plugin system, we used PEP 302 as this sort of 
meta-standard, and (modulo certain bugs in easy_install and pip, most of which 
are apparently getting fixed in pip pretty soon) it worked out reasonably well. 
 The big missing pieces are post-install and post-uninstall hooks.  If we had 
those, translating to native packages for Twisted (and for things that use 
it) could be made totally automatic.

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Fixing the XML batteries

2011-12-10 Thread Glyph Lefkowitz

On Dec 10, 2011, at 2:38 AM, Stefan Behnel wrote:

 Note, however, that html5lib is likely way too big to add it to the stdlib, 
 and that BeautifulSoup lacks a parser for non-conforming HTML in Python 3, 
 which would be the target release series for better HTML support. So, 
 whatever library or API you would want to use for HTML processing is 
 currently only the second question as long as Py3 lacks a real-world HTML 
 parser in the stdlib, as well as a robust character detection mechanism. I 
 don't think that can be fixed all that easily.


Here's the problem in a nutshell, I think:

Everybody wants an HTML parser in the stdlib, because it's inconvenient to pull 
in a dependency for such a simple task.
Everybody wants the stdlib to remain small, stable, and simple and not get 
overcomplicated.
Parsing arbitrary HTML5 is a monstrously complex problem, for which there exist 
rapidly-evolving standards and libraries to deal with it.  Parsing 'the web' 
(which is rapidly growing to include stuff like SVG, MathML etc) is even harder.

My personal opinion is that HTML5Lib gets this problem almost completely right, 
and so it should be absorbed by the stdlib.  Trying to re-invent this from 
scratch, or even use something like BeautifulSoup which uses a bunch of 
heuristics and hacks rather than reference to the laboriously-crafted standard 
that says exactly how parsing malformed stuff has to go to be like a browser, 
seems like it will just give the stdlib solution a reputation for working on 
the test input but not working in the real world.

(No disrespect to BeautifulSoup: it was a great attempt in the pre-HTML5 world 
which it was born into, and I've used it numerous times to implement useful 
things.  But much more effort has been poured into this problem since then, and 
the problems are better understood now.)

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Fixing the XML batteries

2011-12-10 Thread Glyph Lefkowitz

On Dec 10, 2011, at 6:30 PM, Terry Reedy wrote:

A little data: the HTML5lib project lives at
https://code.google.com/p/html5lib/
It has 4 owners and 22 other committers.

The most recent release, html5lib 0.90 for Python, is nearly 2 years old.
Since there is a separate Python3 repository, and there is no mention on
Python3 compatibility elsewhere that I saw, including the pypi listing, I
assume that is for Python2 only.

I believe that you are correct.

A comment on a recent (July 11) Python3 issue
https://code.google.com/p/html5lib/issues/detail?id=187colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20Port
suggest that the Python3 version still has problems. Merged in now, though
still lots of errors and failures in the testsuite.

I don't see what bearing this has on the discussion. There are three possible
ways I can imagine to interpret this information.

First, you could believe that porting a codebase from Python 2 to Python 3 is
much easier than solving a difficult domain-specific problem. In that case,
html5lib has done the hard part and someone interested in html-in-the-stdlib
should do the rest.

Second, you could believe that porting a codebase from Python 2 to Python 3 is
harder than solving a difficult domain-specific problem, in which case
something is seriously wrong with Python 3 or its attendant migration tools and
that needs to be fixed, so someone should fix that rather than worrying about
parsing HTML right now. (I doubt that many subscribers to this list would
share this opinion, though.)

Third, you could believe that parsing HTML is not a difficult domain-specific
problem. But only a crazy person would believe that, so you're left with one
of the previous options :).

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-11 Thread Glyph Lefkowitz


On Sep 11, 2011, at 11:49 AM, Michael Foord wrote:

 Does anyone *actually* use .title() for this? (And why not just use the 
 correct casing in the string literal...)

Yes.  Twisted does, in various MIME-ish places (IMAP, SIP), although not in 
HTTP from what I can see.  I imagine other similar software would as well.

One issue is that you don't always have a string literal to work with.  If 
you're proxying traffic, you start from a mis-cased header and you possibly 
need to correct it to a canonically-cased one.  (On at least one occasion I've 
had to use such a proxy to make certain buggy client software work.)

Of course you could have something like {bCONNECTION-LOST: 
bConnection-Lost, ...} somewhere at module scope, but that feels a bit 
sillier than just having a nice '.title()' method.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Glyph Lefkowitz

On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:

 How about title?

 'content-length'.title()
'Content-Length'

You might say that the protocol has to be case-insensitive so this is a silly 
frill: there are definitely enough case-sensitive crappy bits of network 
middleware out there that this function is critically important for an HTTP 
server.

In general I'd like to defend keeping as many of these methods as possible for 
compatibility (porting to Py3 is already hard enough).  Although even I might 
have a hard time defending 'swapcase', which is never used _at all_ within 
Twisted, on text or bytes.  The only use-case I can think of for that method is 
goofy joke text filters, and it wouldn't be very good at that either.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-01 Thread Glyph Lefkowitz


On Sep 1, 2011, at 5:23 AM, Cesare Di Mauro wrote:

 A simple solution: when tracing is enabled, the new instruction format will 
 never be executed (and information tracking disabled as well).

Correct me if I'm wrong: doesn't this mean that no profiler will accurately be 
able to measure the performance impact of the new instruction format, and 
therefore one may get incorrect data when on is trying to make a CPU 
optimization for real-world performance?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)

2011-08-28 Thread Glyph Lefkowitz


On Aug 28, 2011, at 7:27 PM, Guido van Rossum wrote:

 In general, an existing library cannot be called
 without access to its .h files -- there are probably struct and
 constant definitions, platform-specific #ifdefs and #defines, and
 other things in there that affect the linker-level calling conventions
 for the functions in the library.

Unfortunately I don't know a lot about this, but I keep hearing about something 
called rffi that PyPy uses to call C from RPython: 
http://readthedocs.org/docs/pypy/en/latest/rffi.html.  This has some 
shortcomings currently, most notably the fact that it needs those .h files (and 
therefore a C compiler) at runtime, so it's currently a non-starter for code 
distributed to users.  Not to mention the fact that, as you can see, it's not 
terribly thoroughly documented.  But, that ExternalCompilationInfo object 
looks very promising, since it has fields like includes, libraries, etc.

Nevertheless it seems like it's a bit more type-safe than ctypes or cython, and 
it seems to me that it could cache some of that information that it extracts 
from header files and store it for later when a compiler might not be around.

Perhaps someone with more PyPy knowledge than I could explain whether this is a 
realistic contender for other Python runtimes?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-12 Thread Glyph Lefkowitz


On Aug 12, 2011, at 11:24 AM, P.J. Eby wrote:

 That is, the above code hardocdes a variety of assumptions about the import 
 system that haven't been true since Python 2.3.

Thanks for this feedback.  I honestly did not realize how old and creaky this 
code had gotten.  It was originally developed for Python 2.4 and it certainly 
shows its age.  Practically speaking, the code is correct for the bundled 
importers, and paths and zipfiles are all we've cared about thus far.

 (For example, it assumes that the contents of sys.path strings have 
 inspectable semantics, that the contents of __file__ can tell you things 
 about the module-ness or package-ness of a module object, etc.)

Unfortunately, the primary goal of this code is to do something impossible - 
walk the module hierarchy without importing any code.  So some heuristics are 
necessary.  Upon further reflection, PEP 402 _will_ make dealing with namespace 
packages from this code considerably easier: we won't need to do AST analysis 
to look for a __path__ attribute or anything gross like that improve 
correctness; we can just look in various directories on sys.path and accurately 
predict what __path__ will be synthesized to be.

However, the isPackage() method can and should be looking at the module if it's 
already loaded, and not always guessing based on paths.  The whole reason 
there's an 'importPackages' flag to walk() is that some applications of this 
code care more about accuracy than others, so it tries to be as correct as it 
can be.

(Of course this is still wrong for the case where a __path__ is dynamically 
constructed by user code, but there's only so well one can do at that.)

 If you want to fully support PEP 302, you might want to consider making this 
 a wrapper over the corresponding pkgutil APIs (available since Python 2.5) 
 that do roughly the same things, but which delegate all path string 
 inspection to importer objects and allow extensible delegation for importers 
 that don't support the optional methods involved.

This code still needs to support Python 2.4, but I will make a note of this for 
future reference.

 (Of course, if the pkgutil APIs are missing something you need, perhaps you 
 could propose additions.)

 Now it seems like pure virtual packages are going to introduce a new type of 
 special case into the hierarchy which have neither .pathEntry nor .filePath 
 objects.
 
 The problem is that your API's notion that these things exist as coherent 
 concepts was never really a valid assumption in the first place.  .pth files 
 and namespace packages already meant that the idea of a package coming from a 
 single path entry made no sense.  And namespace packages installed by 
 setuptools' system packaging mode *don't have a __file__ attribute* today...  
 heck they don't have __init__ modules, either.

The fact that getModule('sys') breaks is reason enough to re-visit some of 
these design decisions.

 So, adding virtual packages isn't actually going to change anything, except 
 perhaps by making these scenarios more common.

In that case, I guess it's a good thing; these bugs should be dealt with.  
Thanks for pointing them out.  My opinion of PEP 402 has been completely 
reversed - although I'd still like to see a section about the module system 
from a library/tools author point of view rather than a time-traveling perl 
user's narrative :).

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-12 Thread Glyph Lefkowitz


On Aug 12, 2011, at 2:33 PM, P.J. Eby wrote:

 At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote:
 Upon further reflection, PEP 402 _will_ make dealing with namespace packages 
 from this code considerably easier: we won't need to do AST analysis to look 
 for a __path__ attribute or anything gross like that improve correctness; we 
 can just look in various directories on sys.path and accurately predict what 
 __path__ will be synthesized to be.
 
 The flip side of that is that you can't always know whether a directory is a 
 virtual package without deep inspection: one consequence of PEP 402 is that 
 any directory that contains a Python module (of whatever type), however 
 deeply nested, will be a valid package name.  So, you can't rule out that a 
 given directory *might* be a package, without walking its entire reachable 
 subtree.  (Within the subset of directory names that are valid Python 
 identifiers, of course.)

Are there any rules about passing invalid identifiers to __import__ though, or 
is that just less likely? :)

 However, you *can* quickly tell that a directory *might* be a package or is 
 *probably* one: if it contains modules, or is the same name as an 
 already-discovered module, it's a pretty safe bet that you can flag it as 
 such.

I still like the idea of a 'marker' file.  It would be great if there were a 
new marker like __package__.py.  I say this more for the benefit of users 
looking at a directory on their filesystem and trying to understand whether 
this is a package or not than I do for my own programmatic tools though; it's 
already hard enough to understand the package-ness of a part of your filesystem 
and its interactions with PYTHONPATH; making directories mysteriously and 
automatically become packages depending on context will worsen that situation, 
I think.

I also have this not-terribly-well-defined idea that it would be handy for 
different providers of the _contents_ of namespace packages to provide their 
own instrumentation to be made aware that they've been added to the __path__ of 
a particular package.  This may be a solution in search of a problem, but I 
imagine that each __package__.py would be executed in the same module 
namespace.  This would allow namespace packages to do things like set up 
compatibility aliases, lazy imports, plugin registrations, etc, as they 
currently do with __init__.py.  Perhaps it would be better to define its 
relationship to the package-module namespace in a more sensible way than 
execute all over each other in no particular order.

Also, if I had my druthers, Python would raise an exception if someone added a 
directory marked as a package to sys.path, to refuse to import things from it, 
and when a submodule was run as a script, add the nearest directory not marked 
as a package to sys.path, rather than the script's directory itself.  The whole 
__name__ is wrong because your current directory was wrong when you ran that 
command thing is so confusing to explain that I hope we can eventually consign 
it to the dustbin of history.  But if you can't even reasonably guess whether a 
directory is supposed to be an entry on sys.path or a package, that's going to 
be really hard to do.

 In any case, you probably should *not* do the building of a virtual path 
 yourself; the protocols and APIs added by PEP 402 should allow you to simply 
 ask for the path to be constructed on your behalf.  Otherwise, you are going 
 to be back in the same business of second-guessing arbitrary importer 
 backends again!

What do you mean building of a virtual path?

 (E.g. note that PEP 402 does not say virtual package subpaths must be 
 filesystem or zipfile subdirectories of their parents - an importer could 
 just as easily allow you to treat subdirectories named 'twisted.python' as 
 part of a virtual package with that name!)
 
 Anyway, pkgutil defines some extra methods that importers can implement to 
 support module-walking, and part of the PEP 402 implementation should be to 
 make this support virtual packages as well.

The more that this can focus on module-walking without executing code, the 
happier I'll be :).

 This code still needs to support Python 2.4, but I will make a note of this 
 for future reference.
 
 A suggestion: just take the pkgutil code and bundle it for Python 2.4 as 
 something._pkgutil.  There's very little about it that's 2.5+ specific, at 
 least when I wrote the bits that do the module walking.
 
 Of course, the main disadvantage of pkgutil for your purposes is that it 
 currently requires packages to be imported in order to walk their child 
 modules.  (IIRC, it does *not*, however, require them to be imported in order 
 to discover their existence.)

One of the stipulations of this code is that it might give different results 
when the modules are loaded and not.  So it's fine to inspect that first and 
then invoke pkgutil only in the 'loaded' case, with the knowledge that the 
not-loaded case may

Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-11 Thread Glyph Lefkowitz

On Aug 11, 2011, at 11:39 AM, Barry Warsaw wrote:

 On Aug 11, 2011, at 04:39 PM, Éric Araujo wrote:
 
 * XXX what is the __file__ of a pure virtual package?  ``None``?
  Some arbitrary string?  The path of the first directory with a
  trailing separator?  No matter what we put, *some* code is
  going to break, but the last choice might allow some code to
  accidentally work.  Is that good or bad?
 A pure virtual package having no source file, I think it should have no
 __file__ at all.  I don’t know if that would break more code than using
 an empty string for example, but it feels righter.
 
 I agree that the empty string is the worst of the choices.  no __file__ or
 __file__=None is better.

In some sense, I agree: hacks like empty strings are likely to lead to 
path-manipulation bugs where the wrong file gets opened (or worse, deleted, 
with predictable deleterious effects).  But the whole pure virtual mechanism 
here seems to pile even more inconsistency on top of an already irritatingly 
inconsistent import mechanism.  I was reasonably happy with my attempt to paper 
over PEP 302's weirdnesses from a user perspective:

http://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.html

(or https://launchpad.net/modules if you are not a Twisted user)

Users of this API can traverse the module hierarchy with certain expectations; 
each module or package would have .pathEntry and .filePath attributes, each of 
which would refer to the appropriate place.  Of course __path__ complicates 
things a bit, but so it goes.

Now it seems like pure virtual packages are going to introduce a new type of 
special case into the hierarchy which have neither .pathEntry nor .filePath 
objects.

Rather than a one-by-one ad-hoc consideration of which attribute should be set 
to None or empty strings or string or what have you, I'd really like to see 
a discussion in the PEP saying what a package really is vs. what a module is, 
and what one can reasonably expect from it from an API and tooling perspective. 
 Right now I have to puzzle out the intent of the final API from the 
problem/solution description and thought experiment.

Despite authoring several namespace packages myself, I don't have any of the 
problems described in the PEP.  I just want to know how to write correct tools 
given this new specification.  I suspect that this PEP will be the only 
reference for how packages work for a long time coming (just as PEP 302 was 
before it) so it should really get this right.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] HTMLParser and HTML5

2011-07-29 Thread Glyph Lefkowitz


On Jul 29, 2011, at 7:46 AM, Stefan Behnel wrote:

 Joao S. O. Bueno, 29.07.2011 13:22:
 On Fri, Jul 29, 2011 at 1:37 AM, Stefan Behnel wrote:
 Brett Cannon, 28.07.2011 23:49:
 
 On Thu, Jul 28, 2011 at 11:25, Matt wrote:
 
 - What policies are in place for keeping parity with other HTML
 parsers (such as those in web browsers)?
 
 There aren't any beyond it would be nice.
 [...]
 It's more of an issue of someone caring enough to do the coding work to
 bring the parser up to spec for HTML5 (or introduce new code to live
 beside
 the HTML4 parsing code).
 
 Which, given that html5lib readily exists, would likely be a lot more work
 than anyone who is interested in HTML5 handling would want to invest.
 
 I don't think we need a new HTML5 parsing implementation only to have it in
 the stdlib. That's the old sunny Java way of doing it.
 
 I disaagree.
 Having proper html parsing out of the box is part of the batteries
 included thing.
 
 Well, you can easily prove me wrong by implementing this.
 
 Stefan

Please don't implement this just to profe Stefan wrong :).

The thing to do, if you want html parsing in the stdlib, is to _incorporate_ 
html5lib, which is already a perfectly good, thoroughly tested HTML parser, and 
simply deprecate HTMLParser and friends.  Implementing a new parser would serve 
no purpose I can see.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] HTMLParser and HTML5

2011-07-29 Thread Glyph Lefkowitz

On Jul 29, 2011, at 3:00 PM, Matt wrote:

 I don't see any real reason to drop a decent piece of code (HTMLParser, that 
 is) in favor of a third party library when only relatively minor updates are 
 needed to bring it up to speed with the latest spec.

I am not really one to throw stones here, as Twisted contains a lenient 
pseudo-XML parser which I still maintain - one which decidedly does not agree 
with html5's requirements for dealing with invalid data, but just a bunch of 
ad-hoc guesses of my own.

My impression of HTML5 is that HTMLParser would require significant 
modifications and possibly a drastic re-architecture in order to really do 
HTML5 right; especially the parts that the html5lib authors claim makes HTML5 
streaming-unfriendly, i.e. subtree reordering when encountering certain types 
of invalid data.

But if I'm wrong about that, and there are just a few spec updates and bugfixes 
that need to be applied, by all means, ignore my comment.

-glyph


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Comments of the PEP 3151

2011-07-26 Thread Glyph Lefkowitz


On Jul 26, 2011, at 6:49 PM, Antoine Pitrou wrote:

 On Mon, 25 Jul 2011 15:28:47 +1000
 Nick Coghlan ncogh...@gmail.com wrote:
 There may be some error codes that we choose to map to these generic
 errors, even if we don't give them their own exception types at this
 point (e.g. ECONSHUTDOWN could map directly to ConnectionError).
 
 Ok, I can find neither ECONSHUTDOWN nor ECONNSHUTDOWN on
 www.opengroup.org, and it's not mentioned in errnomodule.c.  Is it some
 system-specific error code?

I assume that ESHUTDOWN is the errno in question?  (This is also already 
mentioned in the PEP.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The socket HOWTO

2011-06-07 Thread Glyph Lefkowitz

On Jun 5, 2011, at 3:35 PM, Martin v. Löwis wrote:

 And that's all fine. I still claim that you have to *understand*
 sockets in order to use it properly. By this, I mean stuff like
 what is a TCP connection? how is it established?, how is UDP
 different from TCP?, when data arrives, what layers of software
 does it go through?, what is a port number?, etc.

Yes, these are all excellent concepts to be familiar with.  But the word 
socket (and the socket HOWTO) refers to a specific way to interface with 
those concepts, the Berkeley socket API: 
http://en.wikipedia.org/wiki/Berkeley_sockets.  Which you don't have to know 
anything about if you're going to use Twisted.  You should know about IPC in 
general, and TCP/UDP specifically if you're going to use Twisted, but sockets 
are completely optional.

Also, I feel that I should point out that the sockets HOWTO does not cover even 
a single one of these concepts in any useful depth.  If you think that these 
are what it should be explaining, it needs some heavy editing.  Here's what it 
has to say about each one:

 what is a TCP connection?

The only place that the characters TCP appear in the entire document is in 
the phrase ... which is completely different from TCP_NODELAY   Nowhere 
is a TCP connection explained at a conceptual level, except to say that it's 
something a web browser does.

 how is UDP different from TCP?

The phrase UDP never appears in the HOWTO.  DGRAM sockets get a brief mention 
as anything else in the sentence: ... you’ll get better behavior and 
performance from a STREAM socket than anything else   (To be fair, I do 
endorse teaching that the difference between TCP and UDP is that you should 
not use UDP to anyone not sufficiently advanced to read the relevant reference 
documentation themselves.)

 when data arrives, what layers of software does it go through?

There's no discussion of this that I can find at all.

 what is a port number?

Aside from a few comments in the code examples, the only discussion of port 
numbers is low number ports are usually reserved for “well known” services 
(HTTP, SNMP etc).

It would be very good to have a Python networking overview somewhere that 
explained this stuff at a very high level, and described how data might get 
into or out of your program, with links to things like the socket HOWTO that 
describe more specific techniques.  This would be useful because most commonly, 
I think that data will get into Python network programs via WSGI, not direct 
sockets or anything like Twisted.

To be clear, having read it now: I do _not_ agree with Antoine that this 
document should be deleted.  I dimly recall that it helped me understand some 
things in the very early days of Twisted.  While it's far from perfect, it 
might help someone in a similar situation understand those things as well 
today.  I just found it interesting that the main concepts one would associate 
with such a HOWTO are nowhere to be found :).

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The socket HOWTO

2011-06-05 Thread Glyph Lefkowitz

On Jun 4, 2011, at 11:32 PM, Martin v. Löwis wrote:

 b) telling people to use Twisted or asyncore on the server side
   if they are new to sockets is bad advice. People *first* have
   to understand sockets, and *then* can use these libraries
   and frameworks. Those libraries aren't made to be black boxes
   that work even if you don't know how - you *have* to know how
   they work inside, or else you can't productively use them.


First, Twisted doesn't always use the BSD sockets API; the Windows IOCP 
reactor, especially, starts off with the socket() function, but things go off 
in a different direction pretty quickly from there.  So it's perfectly fine to 
introduce yourself to networking via Twisted, and many users have done just 
that.  If you're using it idiomatically, you should never encounter a socket 
object or file descriptor poking through the API anywhere.  Asyncore is 
different: you do need to know how sockets work in order to use it, because 
you're expected to call .send() and .recv() yourself.  (And, in my opinion, 
this is a serious design flaw, for reasons which will hopefully be elucidated 
in the PEP that Laurens is now writing.)

Second, it makes me a little sad that it appears to be folk wisdom that Twisted 
is only for servers.  A lot of work has gone into making it equally appropriate 
for clients.  This is especially true if your client has a GUI, where Twisted 
is often better than a protocol-specific library, which may either be blocking 
or have its own ad-hoc event loop.

I don't have an opinion on the socket HOWTO per se, only on the possibility of 
linking to Twisted as an alternate implementation mechanism.  It really would 
be better to say go use Twisted rather than reading any of the following than 
read the following, which will help you understand Twisted.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3.x and bytes

2011-05-19 Thread Glyph Lefkowitz


On May 19, 2011, at 1:43 PM, Guido van Rossum wrote:

 -1; the result is not a *character* but an integer.

Well, really the result ought to be an octet, but I suppose adding an 'octet' 
type is beyond the scope of even this sprawling discussion :).

 I'm personally favoring using b'a'[0] and possibly hiding this in a constant 
 definition.

As someone who spends a frankly unfortunate amount of time handling protocols 
where things like this are necessary, I agree with this recommendation.  In 
protocols where one needs to compare network data with one-byte type 
identifiers or packet prefixes, more (documented) constants and less 
inscrutable junk like

if p == 'c':
   ...
elif p == 'j':
   ...
elif p == 'J': # for compatibility
   ...

would definitely be a good thing.  Of course, I realize that this sort of 
programmer will most likely replace those constants with 99, 106, 74 than take 
a moment to document what they mean, but at least they'll have to pause for a 
moment and realize that they have now lost _all_ mnemonics...

In fact, I feel like I would want to push in the opposite direction: don't 
treat one-byte bytes slices less like integers; I wish I could more easily 
treat n-byte sequences _more_ like integers! :).  More protocols have 2-byte or 
4-byte network-endian packed integers embedded in them than have individual tag 
bytes that I want to examine.  For the typical ASCII-ish protocol where you 
want to look at command names and CRLF-separated messages, you'd never want to 
look at an individual octet, stringish operations like split() will give you 
what you want.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Glyph Lefkowitz

On May 6, 2011, at 12:31 PM, Michael Foord wrote:

 pypy and .NET choose to arbitrarily break cycles rather than leave objects 
 unfinalised and memory unreclaimed. Not sure what Java does.

I think that's a mischaracterization of their respective collectors; 
arbitrarily break cycles implies that user code would see broken or 
incomplete objects, at least during finalization, which I'm fairly sure is not 
true on either .NET or PyPy.

Java definitely has a collector that can handles cycles too.  (None of these 
are reference counting.)

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Glyph Lefkowitz

Apologies in advance for contributing to an obviously and increasingly
off-topic thread, but this kind of FUD about GC is a pet peeve of mine.

On May 6, 2011, at 10:04 AM, Neal Becker wrote:

http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html

Counterpoint: http://lwn.net/Articles/268783/. Sorry Linus, sometimes
correctness matters more than performance.

But, even the performance argument is kind of bogus. See, for example, this
paper on real-time garbage collection:
http://domino.research.ibm.com/comm/research_people.nsf/pages/dgrove.ecoop07.html.
That's just one example of an easy-to-find solution to a problem that Linus
holds up as unsolved or unsolvable. There are solutions to pretty much all of
the problems that Linus brings up. One of these solutions is even famously
implemented by CPython! The CPython string += idiom optimization fixes at
least one case of the you tend to always copy the node antipattern Linus
describes, and lots of languages (especially Scheme and derivatives, IIRC) have
very nice optimizations around this area. One could argue that any functional
language without large pools of mutable state (i.e. Erlang) is a massive
optimization for this case.

Another example: the dirty cache problem Linus talks about can be addressed
by having a GC that cooperates with the VMM:
http://www.cs.umass.edu/~emery/pubs/f034-hertz.pdf.

And the re-using stuff as fast as possible thing is exactly the kind of
problem that generational GCs address. When you run out of space in cache, you
reap your first generation before you start copying stuff. One of the key
insights of generational GC is that you'll usually reclaim enough (in this
case, cache-local) memory that you can keep going for a little while. You
don't have to read a super fancy modern paper on this, Wikipedia explains
nicely:
http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Generational_GC_.28ephemeral_GC.29.
Of course if you don't tune your GC at all for your machine-specific cache
size, you won't see this performance benefit play out.

I don't know if there's a programming language and runtime with a real-time,
VM-cooperating garbage collector that actually exists today which has all the
bells and whistles required to implement an OS kernel, so I wouldn't give the
Linux kernel folks too much of a hard time for still using C; but there's
nothing wrong with the idea in the abstract. The performance differences
between automatic and manual GC are dubious at best, and with a really good GC
and a language that supports it, GC tends to win big. When it loses, it loses
in ways which can be fixed in one area of the code (the GC) rather than
millions of tiny fixes across your whole codebase, as is the case with
strategies used by manual collection algorithms.

The assertion that modern hardware is not designed for big data-structure
pointer-chasing is also a bit silly. On the contrary, modern hardware has
evolved staggeringly massive caches, specifically because large programs
(whether they're GC'd or not) tend to do lots of this kind of thing, because
there's a certain level of complexity beyond which one can no longer avoid it.
It's old hardware, with tiny caches (that were, by virtue of their tininess,
closer to the main instruction-processing silicon), that was optimized for the
carefully stack-allocating everything in the world to conserve cache approach.

You can see this pretty clearly by running your favorite Python benchmark of
choice on machines which are similar except for cache size. The newer machine,
with the bigger cache, will run Python considerably faster, but doesn't help
the average trivial C benchmark that much - or, for that matter, Linux
benchmarks.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] the role of assert in the standard library ?

2011-04-28 Thread Glyph Lefkowitz


On Apr 28, 2011, at 12:59 PM, Guido van Rossum wrote:

 On Thu, Apr 28, 2011 at 12:54 AM, Tarek Ziadé ziade.ta...@gmail.com wrote:
 In my opinion assert should be avoided completely anywhere else than
 in the tests. If this is a wrong statement, please let me know why :)
 
 I would turn that around. The assert statement should not be used in
 unit tests; unit tests should use self.assertXyzzy() always. In
 regular code, assert should be about detecting buggy code. It should
 not be used to test for error conditions in input data. (Both these
 can be summarized as if you still want the test to happen with -O,
 don't use assert.)

You're both right! :)  My take on assert is don't use it, ever.

assert is supposed to be about conditions that never happen.  So there are a 
few cases where I might use it:

If I use it to enforce a precondition, it's wrong because under -OO my 
preconditions won't be checked and my input might be invalid.

If I use it to enforce a postcondition, then my API's consumers have to 
occasionally handle this weird error, except it won't be checked under -OO so 
they won't be able to handle it consistently.

If I use it to try to make assertions about internal state during a 
computation, then I introduce an additional, untested (at the very least 
untested under -OO), probably undocumented (did I remember to say and raises 
AssertionError when... in its docstring?) code path where when this bad 
thing happens, I get an exception instead of a result.

If that's an important failure mode, then there ought to be a documented 
exception, which the computation's consumers can deal with.

If it really should never happen, then I really should have just written some 
unit tests verifying that it doesn't happen in any case I can think of.  And I 
shouldn't be writing code to handle cases I can't come up with any way to 
exercise, because how do I know that it's going to do the right thing?  (If I 
had a dollar for every 'assert' message that didn't have the right number of 
arguments to its format string, etc.)

Also, when things that should never happen do actually happen in real life, 
is a random exception that interrupts the process actually an improvement over 
just continuing on with some potentially bad data?  In most cases, no, it 
really isn't, because by blowing up you've removed the ability of the user to 
take corrective action or do a workaround.  (In the cases where blowing up is 
better because you're about to do something destructive, again, a test seems in 
order.)

My python code is very well documented, which means that there is sometimes a 
significant runtime overhead from docstrings.  That's really my only interest 
in -OO: reducing memory footprint of Python processes by dropping dozens of 
megabytes of library documentation from each process.  The fact that it changes 
the semantics of 'assert' is an unfortunate distraction.

So the only time I'd even consider using 'assert' is in a throwaway script 
which might be run once, that I'm not going to write any tests for and I'm not 
going to maintain, but I might care about just enough to want to blow up 
instead of calling 'os.unlink' if certain conditions are not met.

(But then every time I actually use it that way, I realize that I should have 
dealt with the error sanely and I probably have to go back and fix it anyway.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] python and super

2011-04-14 Thread Glyph Lefkowitz

On Apr 14, 2011, at 12:59 PM, Ronald Oussoren wrote:

 What would the semantics be of a super that (...)

I think it's long past time that this move to python-ideas, if you don't mind.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Supporting Visual Studio 2010

2011-04-05 Thread Glyph Lefkowitz


On Apr 5, 2011, at 8:52 AM, exar...@twistedmatrix.com wrote:

 On 09:58 am, mar...@v.loewis.de wrote:
 Won't that still be an issue despite the stable ABI? Extensions on
 Windows should be linked to the same version of MSVCRT used to compile
 Python
 
 Not if they use the stable ABI. There still might be issues if you
 mix CRTs, but none related to the Python ABI - in particular, none
 of those crashing conditions can arise from the stable ABI.
 
 Does this mean new versions of distutils let you build_ext with any C 
 compiler, instead of enforcing the same compiler as it has done previously?  
 That would be great.

That *would* be great.  But is it possible?

http://www.python.org/dev/peps/pep-0384/ says functions expecting FILE* are 
not part of the ABI, to avoid depending on a specific version of the Microsoft 
C runtime DLL on Windows.  Can extension modules that need to read and write 
files practically avoid all of those functions?  (If your extension module 
links a library with a different CRT, but doesn't pass functions back and forth 
to Python, is that OK?)

The PEP also says that it will allow users to check whether their modules 
conform to the ABI, but it doesn't say how that will be done.  How can we 
build extension modules so that we're sure we're ABI-conformant?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Policy for making changes to the AST

2011-04-04 Thread Glyph Lefkowitz


On Apr 4, 2011, at 2:00 PM, Guido van Rossum wrote:

 On Mon, Apr 4, 2011 at 10:05 AM, fwierzbi...@gmail.com
 fwierzbi...@gmail.com wrote:
 As a re-implementor of ast.py that tries to be node for node
 compatible, I'm fine with #1 but would really like to have tests that
 will fail in test_ast.py to alert me!
 
 [and]
 
 On Mon, Apr 4, 2011 at 10:38 AM, Michael Foord
 fuzzy...@voidspace.org.uk wrote:
 A lot of tools that work with Python source code use ast - so even though
 other implementations may not use the same ast under the hood they will
 probably at least *want* to provide a compatible implementation. IronPython
 is in that boat too (although I don't know if we *have* a compatible
 implementation yet - we certainly feel like we *should* have one).
 
 Ok, so it sounds like ast is *not* limited to CPython?

Oh, definitely not.  I would be pretty dismayed if tools like 
http://bazaar.launchpad.net/~divmod-dev/divmod.org/trunk/files/head:/Pyflakes/
 would not run on Jython  PyPy.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Differences among Emacsen

2011-03-30 Thread Glyph Lefkowitz


On Mar 30, 2011, at 2:54 PM, Barry Warsaw wrote:

 On Mar 30, 2011, at 09:43 AM, Ralf Schmitt wrote:
 
 Barry Warsaw ba...@python.org writes:
 
 In case you missed it, there are now *three* Python modes.  Tim Peters'
 original and best (in my completely unbiased opinion wink) python-mode.el
 which is still being developed, the older but apparently removed from Emacs
 python.el and the 'new' (so I've heard) python.el.
 
 https://github.com/fgallina/python.el is the fourth one..
 
 Wonderful.

I have a plea for posterity: since I'm sure that a hundred people will see this 
post and decide that the best solution to this proliferation of python plugins 
for emacs is that there should be a new one that is even better than all these 
other ones (and also totally incompatible, of course)...

I won't try to stop you all from doing that, but please at least don't call it 
python.el.  This is like if ActiveState, Wing, PyCharm and PyDev for Eclipse 
had all decided to call their respective projects IDLE because that's what 
you call a Python IDE :).  It would be nice to be able to talk about Python / 
Emacs code without having to do an Abbott and Costello routine.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Finally switch urllib.parse to RFC3986 semantics?

2011-03-18 Thread Glyph Lefkowitz


On Mar 18, 2011, at 8:41 PM, Guido van Rossum wrote:

 Really. Do they still call them URIs? :-)

Well, by RFC 398*7* they're calling them IRIs instead.  'irilib', perhaps? ;-)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] funky buildbot

2011-03-10 Thread Glyph Lefkowitz

On Mar 10, 2011, at 3:18 PM, Bill Janssen wrote:

 It's a new Mac Mini running the latest Snow Leopard, with Python 2.6.1
 (the /usr/bin/python) and buildslave 0.8.3, using Twisted 8.2.0.

I realize that Python 2.6 is pretty old too, but a _lot_ of bugfixes have gone 
into Twisted since 8.2.  I'm not 100% sure this is a Twisted issue but you may 
want to try upgrading to 10.2.0 and see if that fixes things.  (I have a dim 
memory of similar issues which were eventually fixed by something in our 
subprocess support...)

-glyph


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Support the /usr/bin/python2 symlink upstream

2011-03-04 Thread Glyph Lefkowitz

On Fri, Mar 4, 2011 at 10:03 AM, Westley Martínez aniko...@gmail.comwrote:

 On Fri, 2011-03-04 at 00:54 -0800, Aaron DeVore wrote:
  On Thu, Mar 3, 2011 at 11:44 PM, Kerrick Staley m...@kerrickstaley.com
 wrote:
   That way, if the sysadmin does decide to replace the installed python
 file, he can do so without inadvertently deleting the previously installed
 binary.
 
  Nit pick: Change he to they to be gender neutral.

 Nit pick: Change they to he to be grammatically correct. If we
 really have to be gender neutral, change he to he or she.


This grammatical rule is a modern fiction with no particular utility.  Go
ahead and use singular they as a gender-neutral pronoun; it was good
enough for Shakespeare, Twain, Austen and Shaw, it should be good enough for
Python.

http://en.wikipedia.org/wiki/Singular_they#Examples_of_generic_they
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Import and unicode: part two

2011-01-20 Thread Glyph Lefkowitz


On Jan 20, 2011, at 11:46 AM, Guido van Rossum wrote:

 On Thu, Jan 20, 2011 at 5:16 AM, Nick Coghlan ncogh...@gmail.com wrote:
 On Thu, Jan 20, 2011 at 10:08 PM, Simon Cross
 hodgestar+python...@gmail.com wrote:
 I'm changing my vote on this to a +1 for two reasons:
 
 * Initially I thought this wasn't supported by Python at all but I see
 that currently it is supported but that support is broken (or at least
 limited to UTF-8 filesystem encodings). Since support is there, might
 as well make it better (especially if it tidies up the code base at
 the same time).
 
 * I still don't think it's a good idea to give modules non-ASCII names
 but the consenting adults approach suggests we should let people
 shoot themselves in the foot if they believe they have good reason to
 do so.
 
 I'm also +1 on this for the reasons Simon gives.
 
 Same here. *Most* code will never be shared, or will only be shared
 between users in the same community. When it goes wrong it's also a
 learning opportunity. :-)

Despite my usual proclivity for being contrarian, I find myself in agreement 
here.  Linux users with locales that don't specify UTF-8 frankly _should_ have 
to deal with all kinds of nastiness until they can transcode their filesystems. 
 MacOS and Windows both have a right answer here and your third-party tools 
shouldn't create mojibake in your filenames.

However, I feel that we should not necessarily be making non-ASCII programmers 
second-class citizens, if they are to be supported at all.  The obvious outcome 
of the current regime is, if you want your code to work in the wider world, you 
have to make everything ASCII, so non-ASCII programmers have to do a huge 
amount of extra work to prepare their stuff for distribution.  As an english 
speaker I'd be happy about that, but as a person with a lot of Chinese in-laws, 
it gives me pause.

There is a difference between sharing code for inspection and editing (where a 
little codec pain is good for the soul: set your locale to UTF-8 and forget it 
already!) and sharing code so that a (non-programming) user can just run it.  
If I can write software in English and distribute it to Chinese people, fair's 
fair, they should be able to write it in chinese and have it work on my 
computer.

To support the latter, could we just make sure that zipimport has a consistent, 
non-locale-or-operating-system-dependent interpretation of encoding?  That way 
a distributed egg would be importable from a zipfile regardless of how screwed 
up the distribution target machine's filesystem is.  (And this is yet more 
motivation for distributors to set zip_safe=True.)___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Import and unicode: part two

2011-01-19 Thread Glyph Lefkowitz


On Jan 20, 2011, at 12:02 AM, Glenn Linderman wrote:

 But for local code, having to think up an ASCII name for a module rather than 
 use the obvious native-language name, is just brain-burden when creating the 
 code.

Is it really?  You already had to type 'import', presumably if you can think in 
Python you can think in ASCII.

(After my experiences with namespace crowding in Twisted, I'm inclined to 
suggest something more like import m_07117FE4A1EBD544965DC19573183DA2 as café 
- then I never need to worry about café2 looking ugly or cafe being 
incompatible :).)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Import and unicode: part two

2011-01-19 Thread Glyph Lefkowitz


On Jan 20, 2011, at 12:19 AM, Glenn Linderman wrote:

 Now if the stuff after m_ was the hex UTF-8 of  café, that could get 
 interesting :)

(As it happens, it's the hex digest of the MD5 of the UTF-8 of café... ;-))___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] devguide: Point out that OS X users need to change examples to use python.exe instead of

2011-01-10 Thread Glyph Lefkowitz


On Jan 10, 2011, at 1:37 PM, Łukasz Langa wrote:

 I'm using the case-sensitive variant of HFS+ since 10.4. It works, I like it 
 and you get ./python with it.

I realize that this isn't a popularity contest for this feature, but I feel 
like I should pipe up here and mention that it breaks some applications - for 
example, you can't really install World of Warcraft on a case-insensitive 
filesystem.  Not the filesystem's fault really, but it is a good argument for 
why users shouldn't choose it.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Checking input range in time.asctime and time.ctime

2011-01-05 Thread Glyph Lefkowitz


On Jan 5, 2011, at 4:33 PM, Guido van Rossum wrote:

 Shouldn't the logic be to take the current year into account? By the
 time 2070 comes around, I'd expect 70 to refer to 2070, not to 1970.
 In fact, I'd expect it to refer to 2070 long before 2070 comes around.
 
 All of which makes me think that this is better left to the app, which
 can decide for itself whether it is more important to represent dates
 in the future or dates in the past.

The point of this somewhat silly flag (as I understood its description earlier 
in the thread) is to provide compatibility with POSIX 2-year dates.  As per 
http://pubs.opengroup.org/onlinepubs/007908799/xsh/strptime.html - 

%y
is the year within century. When a century is not otherwise specified, values 
in the range 69-99 refer to years in the twentieth century (1969 to 1999 
inclusive); values in the range 00-68 refer to years in the twenty-first 
century (2000 to 2068 inclusive). Leading zeros are permitted but not required.

So, 70 means 1970, forever, in programs that care about this nonsense.

Personally, by the time 2070 comes around, I hope that 70 will just refer to 
70 A.D., and get you odd looks if you use it in a written date - you might as 
well just write '0' :).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Possible optimization for LOAD_FAST ?

2011-01-03 Thread Glyph Lefkowitz


On Jan 2, 2011, at 10:18 PM, Guido van Rossum wrote:

 On Sun, Jan 2, 2011 at 5:50 PM, Alex Gaynor alex.gay...@gmail.com wrote:
 No, it's singularly impossible to prove that any global load will be any 
 given
 value at compile time.  Any optimization based on this premise is wrong.
 
 True.
 
 My proposed way out of this conundrum has been to change the language
 semantics slightly so that global names which (a) coincide with a
 builtin, and (b) have no explicit assignment to them in the current
 module, would be fair game for such optimizations, with the
 understanding that the presence of e.g. len = len anywhere in the
 module (even in dead code!) would be sufficient to disable the
 optimization.
 
 But barring someone interested in implementing something based on this
 rule, the proposal has languished for many years.

Wouldn't this optimization break things like mocking out 'open' for testing via 
'module.open = fakeopen'?  I confess I haven't ever wanted to change 'len' but 
that one seems pretty useful.

If CPython wants such optimizations, it should do what PyPy and its ilk do, 
which is to notice the assignment, but recompile code in that module to disable 
the fast path at runtime, preserving the existing semantics.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] len(chr(i)) = 2?

2010-11-25 Thread Glyph Lefkowitz

On Nov 24, 2010, at 4:03 AM, Stephen J. Turnbull wrote:

 You end up proliferating types that all do the same kind of thing.  Judicious 
 use of inheritance helps, but getting the fundamental abstraction right is 
 hard.  Or least, Emacs hasn't found it in 20 years of trying.

Emacs hasn't even figured out how to do general purpose iteration in 20 years 
of trying either.  The easiest way I've found to loop across an arbitrary pile 
of 'stuff' is the CL 'loop' macro, which you're not even supposed to use.  Even 
then, you still have to make the arcane and pointless distinction of using 
'across' or 'in' or 'on'.  Python, on the other hand, has iteration pretty well 
tied up nicely in a bow.

I don't know how to respond to the rest of your argument.  Nothing you've said 
has in any way indicated to me why having code-point offsets is a good idea, 
only that people who know C and elisp would rather sling around piles of 
integers than have good abstract types.

For example:

 I think it more likely that markers are very expense to create and use 
 compared to integers.

What?  When you do 'for x in str' in python, you are already creating an 
iterator object, which has to store the exact same amount of state that our 
proposed 'marker' or 'character pointer' would have to store.  The proposed 
UTF-8 marker would have to do a tiny bit more work when iterating because it 
would have to combine multibyte characters, but in exchange for that you get to 
skip a whole ton of copying when encoding and decoding.  How is this expensive 
to create and use?  For every application I have ever designed, encountered, or 
can even conjecture about, this would be cheaper.  (Assuming not just a UTF-8 
string type, but one for UTF-16 as well, where native data is in that format 
already.)

For what it's worth, not wanting to use abstract types in Emacs makes sense to 
me: I've written my share of elisp code, and it is hard to create reasonable 
abstractions in Emacs, because the facilities for defining types and creating 
polymorphic logic are so crude.  It's a lot easier to just assume your 
underlying storage is an array, because at the end of the day you're going to 
need to call some functions on it which care whether it's an array or an alist 
or a list or a vector anyway, so you might as well just say so up front.  But 
in Python we could just call 'mystring.by_character()' or 
'mystring.by_codepoint()' and get an iterator object back and forget about all 
that junk.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] len(chr(i)) = 2?

2010-11-25 Thread Glyph Lefkowitz

On Nov 24, 2010, at 10:55 PM, Stephen J. Turnbull wrote:

 Greg Ewing writes:
 On 24/11/10 22:03, Stephen J. Turnbull wrote:
 But
 if you actually need to remember positions, or regions, to jump to
 later or to communicate to other code that manipulates them, doing
 this stuff the straightforward way (just copying the whole iterator
 object to hang on to its state) becomes expensive.
 
 If the internal representation of a text pointer (I won't call it
 an iterator because that means something else in Python) is a byte
 offset or something similar, it shouldn't take up any more space
 than a Python int, which is what you'd be using anyway if you
 represented text positions by grapheme indexes or whatever.
 
 That's not necessarily true.  Eg, in Emacs (there you go again),
 Lisp integers are not only immediate (saving one pointer), but the
 type is encoded in the lower bits, so that there is no need for a type
 pointer -- the representation is smaller than the opaque marker type.
 Altogether, up to 8 of 12 bytes saved on a 32-bit platform, or 16 of
 24 bytes on a 64-bit platform.

Yes, yes, lisp is very clever.  Maybe some other runtime, like PyPy, could make 
this optimization.  But I don't think that anyone is filling up main memory 
with gigantic piles of character indexes and need to squeeze out that extra 
couple of bytes of memory on such a tiny object.  Plus, this would allow such a 
user to stop copying the character data itself just to decode it, and on 
mostly-ascii UTF-8 text (a common use-case) this is a 2x savings right off the 
bat.

 In Python it's true that markers can use the same data structure as
 integers and simply provide different methods, and it's arguable that
 Python's design is better.  But if you use bytes internally, then you
 have problems.

No, you just have design questions.

 Do you expose that byte value to the user?

Yes, but only if they ask for it.  It's useful for computing things like quota 
and the like.

 Can users (programmers using the language and end users) specify positions in 
 terms of byte values?

Sure, why not?

 If so, what do you do if the user specifies a byte value that points into a 
 multibyte character?

Go to the beginning of the multibyte character.  Report that position; if the 
user then asks the requested marker object for its position, it will report 
that byte offset, not the originally-requested one.  (Obviously, do the same 
thing for surrogate pair code points.)

 What if the user wants to specify position by number of characters?

Part of the point that we are trying to make here is that nobody really cares 
about that use-case.  In order to know anything useful about a position in a 
text, you have to have traversed to that location in the text. You can remember 
interesting things like the offsets of starts of lines, or the x/y positions of 
characters.

 Can you translate efficiently?

No, because there's no point :).  But you _could_ implement an overlay that 
cached things like the beginning of lines, or the x/y positions of interesting 
characters.

 As I say elsewhere, it's possible that there really never is a need to 
 efficiently specify an absolute position in a large text as a character 
 (grapheme, whatever) count.

 But I think it would be hard to implement an efficient text-processing 
 *language*, eg, a Python module
 for *full conformance* in handling Unicode, on top of UTF-8.

Still: why?  I guess if I have some free time I'll try my hand at it, and maybe 
I'll run into a wall and realize you're right :).

 Any time you have an algorithm that requires efficient access to arbitrary 
 text positions, you'll spend all your skull sweat fighting the 
 representation.  At least, that's been my experience with Emacsen.

What sort of algorithm would that be, though?  The main thing that I could 
think of is a text editor trying to efficiently allow the user to scroll to the 
middle of a large file without reading the whole thing into memory.  But, in 
that case, you could use byte-positions to estimate, and display an heuristic 
number while calculating the real line numbers.  (This is what 'less' does, and 
it seems to work well.)

 So I don't really see what you're arguing for here. How do
 *you* think positions in unicode strings should be represented?
 
 I think what users should see is character positions, and they should
 be able to specify them numerically as well as via an opaque marker
 object.  I don't care whether that position is represented as bytes or
 characters internally, except that the experience of Emacsen is that
 representation as byte positions is both inefficient and fragile.  The
 representation as character positions is more robust but slightly more
 inefficient.

Is it really the representation as byte positions which is fragile (i.e. the 
internal implementation detail), or the exposure of that position to calling 
code, and the idiomatic usage of that number as an integer?

Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Glyph Lefkowitz


On Nov 23, 2010, at 10:37 AM, ben.cottr...@nominum.com wrote:

 I'd prefer not to think of the number of times I've made the following 
 mistake:
 
 s = socket.socket(socket.SOCK_DGRAM, socket.AF_INET)

If it's any consolation, it's fewer than the number of times I have :).

(More fun, actually, is where you pass a file descriptor to the wrong argument 
of 'fromfd'...)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Glyph Lefkowitz


On Nov 23, 2010, at 10:01 AM, Antoine Pitrou wrote:

 Well, it is easy to assign range(N) to a tuple of names when desired. I
 don't think an automatically-enumerating constant generator is needed.

I don't think that numerical enumerations are the only kind of constants we're 
talking about.  Others have already mentioned strings.  Also, see 
http://tm.tl/4671 for some other use-cases.  Since this isn't coming to 2.x, 
we're probably going to do our own thing anyway (unless it turns out that 
flufl.enum is so great that we want to add another dependency...) but I'm 
hoping that the outcome of this discussion will point to something we can be 
compatible with.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Glyph Lefkowitz

On Nov 23, 2010, at 7:22 PM, James Y Knight wrote:

 On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote:
 Maybe Python should have used UTF-8 as its internal unicode
 representation. Then people who were foolish enough to assume
 one character per string item would have their programs break
 rather soon under only light unicode testing. :-)
 
 You put a smiley, but, in all seriousness, I think that's actually the right 
 thing to do if anyone writes a new programming language. It is clearly the 
 right thing if you don't have to be concerned with backwards-compatibility: 
 nobody really needs to be able to access the Nth codepoint in a string in 
 constant time, so there's not really any point in storing a vector of 
 codepoints.
 
 Instead, provide bidirectional iterators which can traverse the string by 
 byte, codepoint, or by grapheme (that is: the set of combining characters + 
 base character that go together, making up one thing which a human would 
 think of as a character).


I really hope that this idea is not just for new programming languages.  If you 
switch from doing unicode wrong to doing unicode right in Python, you 
quadruple the memory footprint of programs which primarily store and manipulate 
large amounts of text.

This is especially ridiculous in PyGTK applications, where the GUI's internal 
representation required by the GUI UTF-8 anyway, so the round-tripping of 
string data back and forth to the exploded UTF-32 representation is wasting 
gobs of memory and time.  It at least makes sense when your C library's idea 
about character width and your Python build match up.

But, in a desktop app this is unlikely to be a performance concern; in servers, 
it's a big deal; measurably so.  I am pretty sure that in the server apps that 
I work on, we are eventually going to need our own string type and UTF-8 logic 
that does exactly what James suggested - certainly if we ever hope to support 
Py3.

(I dimly recall that both James and I have made this point before, but it's 
pretty important, so it bears repeating.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a)

2010-11-23 Thread Glyph Lefkowitz

On Nov 23, 2010, at 9:02 AM, Antoine Pitrou wrote:

 On Tue, 23 Nov 2010 00:07:09 -0500
 Glyph Lefkowitz gl...@twistedmatrix.com wrote:
 On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto 
 ocean-c...@m2.ccsnet.ne.jp wrote:
 
 Hello. Does this affect python? Thank you.
 
 http://www.openssl.org/news/secadv_20101116.txt
 
 
 No.
 
 Well, actually it does, but Python links against the system OpenSSL on
 most platforms (except Windows), so it's up to the OS vendor to apply
 the patch.


It does?  If so, I must have misunderstood the vulnerability.  Can you explain 
how it affects Python?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Glyph Lefkowitz

On Nov 23, 2010, at 9:44 PM, Stephen J. Turnbull wrote:

 James Y Knight writes:
 
 You put a smiley, but, in all seriousness, I think that's actually
 the right thing to do if anyone writes a new programming
 language. It is clearly the right thing if you don't have to be
 concerned with backwards-compatibility: nobody really needs to be
 able to access the Nth codepoint in a string in constant time, so
 there's not really any point in storing a vector of codepoints.
 
 A sad commentary on the state of Emacs usage, nobody.
 
 The theory is that accessing the first character of a region in a
 string often occurs as a primitive operation in O(N) or worse
 algorithms, sometimes without enough locality at the collection of
 regions level to give a reasonably small average access time.

I'm not sure what you mean by the theory is.  Whose theory?  About what?

 In practice, any *Emacs user can tell you that yes, we do need to be
 able to access the Nth codepoint in a buffer in constant time.  The
 O(N) behavior of current Emacs implementations means that people often
 use a binary coding system on large files.  Yes, some position caching
 is done, but if you have a large file (eg, a mail file) which is
 virtually segmented using pointers to regions, locality gets lost.
 (This is not a design bug, this is a fundamental requirement: consider
 fast switching between threaded view and author-sorted view.)

Sounds like a design bug to me.  Personally, I'd implement fast switching 
between threaded view and author-sorted view the same way I'd address any 
other multiple-views-on-the-same-data problem.  I'd retain data structures for 
both, and update them as the underlying model changed.

These representations may need to maintain cursors into the underlying 
character data, if they must retain giant wads of character data as an 
underlying representation (arguably the _main_ design bug in Emacs, that it 
encourages you to do that for everything, rather than imposing a sensible 
structure), but those cursors don't need to be code-point counters; they could 
be byte offsets, or opaque handles whose precise meaning varied with the 
potentially variable underlying storage.

Also, please remember that Emacs couldn't be implemented with giant Python 
strings anyway: crucially, all of this stuff is _mutable_ in Emacs.

 And of course an operation that sorts regions in a buffer using
 character pointers will have the same problem.  Working with memory
 pointers, OTOH, sucks more than that; GNU Emacs recently bit the
 bullet and got rid of their higher-level memory-oriented APIs, all of
 the Lisp structures now work with pointers, and only the very
 low-level structures know about character-to-memory pointer
 translation.
 
 This performance issue is perceptible even on 3GHz machines with not
 so large (50MB) mbox files.  It's *horrid* if you do something like
 occur on a 1GB log file, then try randomly jumping to detected log
 entries.

Case in point: occur needs to scan the buffer anyway; you can't do better 
than linear time there.  So you're going to iterate through the buffer, using 
one of the techniques that James proposed, and remember some locations.  Why 
not just have those locations be opaque cursors into your data?

In summary: you're right, in that James missed a spot.  You need bidirectional, 
*copyable* iterators that can traverse the string by byte, codepoint, grapheme, 
or decomposed glyph.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a)

2010-11-22 Thread Glyph Lefkowitz

On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto 
ocean-c...@m2.ccsnet.ne.jp wrote:

 Hello. Does this affect python? Thank you.

 http://www.openssl.org/news/secadv_20101116.txt


No.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Breaking undocumented API

2010-11-16 Thread Glyph Lefkowitz


On Nov 16, 2010, at 4:49 PM, Guido van Rossum wrote:

 PEP 8 isn't nearly visible enough, either.  Whatever the rule is, it needs
 to be presented with the information itself.  If the rule is that things not
 documented in the library manual have no compatibility guarantees, then all
 of the means of getting documentation *other* than looking at the library
 manual need to indicate this somehow (alternatively, the information
 shouldn't be duplicated, but I doubt I'll convince anyone of that).
 
 Assuming people actually read the disclaimers.

I don't think it necessarily needs to be presented as a disclaimer.  There will 
always be people who just ignore part of the information presented, but the 
message could be something along the lines of Here's some basic documentation, 
but it might be out-of-date or incomplete.  You can find a better reference at 
http://helpful-hyperlink.example.com.  If it's easy to click on the link, I 
think a lot of people will click on it.  Especially since the library reference 
really _is_ more helpful than the docstrings, for the standard library.

(IMHO, dir()'s semantics are so weird that it should emit a warning too, like 
looking for docs?  please use help().)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Breaking undocumented API

2010-11-10 Thread Glyph Lefkowitz


On Nov 10, 2010, at 2:21 PM, James Y Knight wrote:

 On the other hand, if you make the primary mechanism to indicate privateness 
 be a leading underscore, that's obvious to everyone.

+1.

One of the best features of Python is the ability to make a conscious decision 
to break the interface of a library and just get on with your work, even if 
your use-case is not really supported, because nothing can stop you calling its 
private functionality.

But, IMHO the worst problem with Python is the fact that you can do this 
_without realizing it_ and pay a steep maintenance price later when an upgrade 
of something springs the trap that you had unwittingly set for yourself.

The leading-underscore convention is the only thing I've found that even 
mitigates this problem.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Breaking undocumented API

2010-11-09 Thread Glyph Lefkowitz


On Nov 8, 2010, at 4:50 PM, Guido van Rossum wrote:
 On Mon, Nov 8, 2010 at 3:55 PM, Glyph Lefkowitz gl...@twistedmatrix.com 
 wrote:
 This seems like a pretty clear case of practicality beats purity.  Not 
 only has nobody complained about deprecatedModuleAttribute, but there are 
 tons of things which show up in sys.modules that aren't modules in the sense 
 of 'instances of ModuleType'.  The Twisted reactor, for example, is an 
 instance, and we've been doing *that* for about 10 years with no complaints.
 
 But the Twisted universe is only a subset of the Python universe. The
 Python stdlib needs to move more carefully.

While this is true, I think the Twisted universe generally represents a 
particularly conservative, compatibility-conscious area within the Python 
universe (multiverse?).  I know of several Twisted users who regularly upgrade 
to the most recent version of Twisted without incident, but can't move from 
Python 2.4-2.5 because of compatibility issues.

That's not to say that there are no areas within the larger Python ecosystem 
that I'm unaware of where putting non-module-objects into sys.modules would 
cause issues.  But if it were a practice that were at all common, I suspect 
that we would have bumped into it by now.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Breaking undocumented API

2010-11-08 Thread Glyph Lefkowitz


On Nov 8, 2010, at 2:35 PM, exar...@twistedmatrix.com wrote:

 On 09:57 pm, br...@python.org wrote:
 On Mon, Nov 8, 2010 at 13:45,  exar...@twistedmatrix.com wrote:
 On 09:25 pm, br...@python.org wrote:
 
 On Mon, Nov 8, 2010 at 13:03,  exar...@twistedmatrix.com wrote:
 
 On 07:58 pm, br...@python.org wrote:
 
 I don't think a strict don't remove without deprecation policy is
 workable.  For example, is trace.rx_blank constant part of the trace
 module API that needs to be preserved indefinitely?  I don't even know
 if it is possible to add a deprecation warning to it, but
 CoverageResults._blank_re would certainly be a better place for it.
 
 The deprecation policy obviously cannot apply to module-level
 attributes.
 
 I'm not sure why this is.  Can you elaborate?
 
 There is no way to directly trigger a DeprecationWarning for an
 attribute. We can still document it, but there is just no way to
 programmatically enforce it.
 
 What about `deprecatedModuleAttribute`
 (http://twistedmatrix.com/documents/current/api/twisted.python.deprecate.html)
 or zope.deprecation
 (http://docs.zope.org/zope3/Book/deprecation/show.html) which inspired it?
 
 Just checked the code and it looks like it substitutes the module for
 some proxy object? To begin that break subclass checks. After that I
 don't know the ramifications without really digging into the
 ModuleType code.
 
 That could be fixed if ModuleType allowed subclassing. :)
 
 For what it's worth, no one has complained about problems caused by 
 `deprecatedModuleAttribute`, but we've only been using it for about two and a 
 half years.

This seems like a pretty clear case of practicality beats purity.  Not only 
has nobody complained about deprecatedModuleAttribute, but there are tons of 
things which show up in sys.modules that aren't modules in the sense of 
'instances of ModuleType'.  The Twisted reactor, for example, is an instance, 
and we've been doing *that* for about 10 years with no complaints.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Pickle alternative in stdlib (Was: On breaking modules into packages)

2010-11-04 Thread Glyph Lefkowitz


On Nov 4, 2010, at 12:49 PM, Guido van Rossum wrote:

 What's the attack you're thinking of on marshal? It never executes any
 code while unmarshalling (although it can unmarshal code objects --
 but the receiving program has to do something additionally to execute
 those).

These issues may have been fixed now, but a long time ago I recall seeing some 
nasty segfaults which looked exploitable when feeding marshal malformed data.  
If they still exist, running a fuzzer on some pyc files should reveal them 
pretty quickly.

When I ran across them I didn't think much of them, and probably did not even 
report the bug, since marshal is mostly used to load code anyway, which is 
implicitly trusted.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] On breaking modules into packages Was: [issue10199] Move Demo/turtle under Lib/

2010-11-03 Thread Glyph Lefkowitz


On Nov 3, 2010, at 1:04 PM, James Y Knight wrote:

 This is the strongest reason why I recommend to everyone I know that they not 
 use pickle for storage they'd like to keep working after upgrades [not just 
 of stdlib, but other 3rd party software or their own software]. :)

+1.

Twisted actually tried to preserve pickle compatibility in the bad old days, 
but it was impossible.  Pickles should never really be saved to disk unless 
they contain nothing but lists, ints, strings, and dicts.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] On breaking modules into packages Was: [issue10199] Move Demo/turtle under Lib/

2010-11-03 Thread Glyph Lefkowitz


On Nov 3, 2010, at 11:26 AM, Alexander Belopolsky wrote:

 This may not be a problem for smart tools, but for me and a simple
 editor what used to be:


Maybe this is the real problem?  It's 2010, we should all be far enough beyond 
EDLIN that our editors can jump to the definition of a Python class.  Even Vim 
can be convinced to do this (http://rope.sourceforge.net/ropevim.html).  
Could Python itself make this easier?  Maybe ship with a command that says 
hey, somewhere on sys.path, there is a class with this name.  Please run 
'$EDITOR file +line' (or the current OS's equivalent) so I can look at the 
source code.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] closing files and sockets in a timely manner in the stdlib

2010-10-30 Thread Glyph Lefkowitz


On Oct 30, 2010, at 2:39 PM, Jack Diederich wrote:

 On Fri, Oct 29, 2010 at 8:35 PM, Brett Cannon br...@python.org wrote:
 For those of you who have not noticed, Antoine committed a patch that
 raises a ResourceWarning under a pydebug build if a file or socket is
 closed through garbage collection instead of being explicitly closed.
 
 Just yesterday I discovered /proc/your PID here/fd/ which is a list
 of open file descriptors for your PID on *nix and includes all open
 files, pipes, and sockets.  Very handy, I filed some tickets about
 company internal libs that were opening file handles as a side effect
 of import (logging mostly).  I tried to provoke standard python
 imports (non-test) to leave some open handles and came up empty.

That path (and anything below /proc, really) is a list of open file descriptors 
specifically on Linux, not *nix.  Also on linux, you can avoid your pid 
here by just doing /proc/self.

A more portable (albeit not standard) path for what file descriptors do I have 
open is /dev/fd/.  This is supported via a symlink to /proc/self on all the 
Linuxes I've tested on.  There's no portable standard equivalent for 
not-yourself processes that I'm aware of, though.

See more discussion here: http://twistedmatrix.com/trac/ticket/4522.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Continuing 2.x

2010-10-29 Thread Glyph Lefkowitz

On Oct 28, 2010, at 10:51 PM, Brett Cannon wrote:

 I think people need to stop viewing the difference between Python 2.7
 and Python 3.2 as this crazy shift and view it from python-dev's
 perspective; it should be viewed one follows from the other at this
 point. You can view it as Python 3.2 is the next version after Python
 2.7 just like 2.7 followed to 2.6, which makes the policies we follow
 for releases make total sense and negates this discussion. It just so
 happens people don't plan to switch to the newest release immediately
 as the backward-incompatible changes are more involved than what
 people are used to from past releases.


Brett, with all due respect, this is not a reasonable position.  You are making 
it sound like the popular view of 3.2 is a crazy shift is based on a personal 
dislike of python-dev or something.  The fact is that the amount of effort 
required to port to 3.2 is extreme compared to previous upgrades, and most 
people still aren't willing to deal with it.  It is a crazy shift.

Let's take PyPI numbers as a proxy.  There are ~8000 packages with a 
Programming Language::Python classifier.  There are ~250 with Programming 
Langauge::Python::3.  Roughly speaking, we can say that is 3% of Python code 
which has been ported so far.  Python 3.0 was released at the end of 2008, so 
people have had roughly 2 years to port, which comes up with 1.5% per year.

Let's say that 20% of the code on PyPI is just junk; it's unfair to expect 100% 
of all code ever to get ported.  But, still: with this back-of-the-envelope 
estimate of the rate of porting, it will take over 50 years before a decisive 
majority of Python code is on Python 3.

By contrast, there are 536 packages with ::2.6, and 177 with ::2.7.  (Trying to 
compare apples to apples here, since I assume the '2' tag is much more lightly 
used than '3' to identify supported versions; I figure someone likely to tag 
one micro-version would also tag the other.)

2.7 was released on July 3rd, so let's be generous and say approximately 6 
months.  That's 30% of packages, ported in 6 months, or 60% per year.  This 
means that Python 3 is two orders of magnitude crazier of a shift than 2.7.

I know that the methods involved at arriving at these numbers are not 
particularly good. But, I think that if their accuracy differs from that of the 
download stats, it's better: it takes a much more significant commitment to 
actually write some code and upload it than to accidentally download 3.x 
because it's the later version.

Right now, Kristján is burning off his (non-fungible) enthusiasm in this 
discussion rather than addressing more 2.x maintenance issues.  If 3.x adoption 
takes off and makes a nice hockey stick graph, then few people will care about 
this in retrospect.  In the intervening hypothetical half-century while we wait 
to see how it pans out, isn't it better to just have an official Python branch 
for the maybe 2.8 release?  Nobody from the current core team needs to work 
on it, necessarily; either other, new maintainers will show up or they won't.  
For that matter, Kristján is still talking about porting much of his work to 
3.x anyway.

In the best case (3.x takes over the world in 6 months) a 2.x branch won't be 
needed and nobody will show up to do the work of a release; some small amount 
of this work (the stuff not ported to 3.x) will be lost.  In the medium case 
(3.x adoption is good, but there are still millions of 2.x users in 5 years) it 
will accumulate some helpers that will make migrating to 3.x even smoother than 
with 2.7.  In the worst case (straw man: 3.x adoption actually declines, and 
distros start maintaining their own branches of 2.7) I'm sure everyone will be 
glad that some of this maintenance effort took place and there's some central 
place to continue it.

I'm perfectly willing to admit that I'm still too pessimistic about this and I 
could be wrong.  But given the relatively minimal amount of effort required to 
let 2.x bugs continue to get fixed under the aegis of Python.org rather than 
going through the painful negotiation process of figuring out where else to 
host it (and thereby potentially losing a bunch of maintenance that would not 
otherwise happen), it seems foolhardy to insist that those of us who think 2.x 
is going to necessitate another release must necessarily be wrong.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Support for async read/write

2010-10-20 Thread Glyph Lefkowitz


On Oct 19, 2010, at 9:55 PM, exar...@twistedmatrix.com wrote:

 Not only is the performance usually worse than expected, the behavior of 
 aio_* functions require all kinds of subtle and mysterious coordination with 
 signal handling, which I'm not entirely sure Python would even be able to 
 pull off without some modifications to the signal module.  (And, as 
 Jean-Paul mentioned, if your OS kernel runs out of space in a queue 
 somewhere, completion notifications might just never be delivered at all.)
 
 Just to be clear, James corrected me there.  I thought Jesus was talking 
 about the mostly useless Linux AIO APIs, which have the problems I described. 
  He was actually talking about the POSIX AIO APIs, which have a different set 
 of problems making them a waste of time.

I know, I'm referring to the behavior of POSIX AIO.

Perhaps I'm overstating the case with 'subtle and mysterious', then, but the 
POISX 'aiocb' structure still includes an 'aio_sigevent' member which is the 
way to find out about I/O event completion.  If you're writing an application 
that uses AIO, basically all of your logic ends up living in the context of a 
signal handler, and as 
http://www.opengroup.org/onlinepubs/95399/functions/xsh_chap02_04.html#tag_02_04_01
 puts it,

When signal-catching functions are invoked asynchronously with process 
execution, the behavior of some of the functions defined by this volume of IEEE 
Std 1003.1-2001 is unspecified if they are called from a signal-catching 
function.

Of course, you could try using signalfd(), but that's not in POSIX.

(Or, you could use SIGEV_THREAD, but that would be functionally equivalent to 
running read() in a thread, except much more difficult.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Support for async read/write

2010-10-20 Thread Glyph Lefkowitz


On Oct 20, 2010, at 12:31 AM, Jeffrey Yasskin wrote:

 No comment on the rest of your claim, but this is a silly argument.
 The standard says the same thing about at least fcntl.h, signal.h,
 pthread.h, and ucontext.h, which clearly are useful.

It was meant to be tongue-in-cheek :).  Perhaps I should not have assumed that 
everyone else was as familiar with the POSIX documentation; I figured that most 
readers would know that most pages say that.

But, that was the result of a string of many different searches attempting to 
find someone explaining why this was a good idea or why anyone would want to 
use it.  I think in this case, it's accurate.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Support for async read/write

2010-10-19 Thread Glyph Lefkowitz


On Oct 19, 2010, at 8:09 PM, James Y Knight wrote:

 There's a difference.
 
 os._exit is useful. os.open is useful. aio_* are *not* useful. For anything. 
 If there's anything you think you want to use them for, you're wrong. It 
 either won't work properly or it will worse performing than the simpler 
 alternatives.


I'd like to echo this sentiment.  This is not about providing a 'safe' wrapper 
to hide some powerful feature of these APIs: the POSIX aio_* functions are 
really completely useless.

To quote the relevant standard 
http://www.opengroup.org/onlinepubs/95399/basedefs/aio.h.html:

APPLICATION USAGE

None.

RATIONALE

None.

FUTURE DIRECTIONS

None.

Not only is the performance usually worse than expected, the behavior of aio_* 
functions require all kinds of subtle and mysterious coordination with signal 
handling, which I'm not entirely sure Python would even be able to pull off 
without some modifications to the signal module.  (And, as Jean-Paul mentioned, 
if your OS kernel runs out of space in a queue somewhere, completion 
notifications might just never be delivered at all.)

I would love for someone to prove me wrong.  In particular, I would really love 
for there to be a solution to asynchronous filesystem I/O better than start a 
thread, read until you block.  But, as far as I know, there isn't, and 
wrapping these functions will just confuse and upset anyone who attempts to use 
them in any way.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-19 Thread Glyph Lefkowitz

On Sep 18, 2010, at 10:18 PM, Steve Holden wrote:

 I could probably be persuaded to merge the APIs, but the email6
 precedent suggests to me that separating the APIs better reflects the
 mental model we're trying to encourage in programmers manipulating
 text (i.e. the difference between the raw octet sequence and the text
 character sequence/parsed data).
 
 That sounds pretty sane and coherent to me.

While I don't like the email6 precedent as such (that there would be different 
parsed objects, based on whether you started parsing with bytes or with 
strings), the idea that when you are working directly with bytes or text, you 
should have to know which one you have, is a good one.  +1 for keeping the APIs 
separate with 'urlsplitb' etc.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread Glyph Lefkowitz

On Sep 16, 2010, at 4:51 PM, R. David Murray wrote:

 Given a message, there are many times you want to serialize it as text
 (for example, for presentation in a UI).  You could provide alternate
 serialization methods to get text out on demandbut then what if
 someone wants to push that text representation back in to email to
 rebuild a model of the message?

You tell them too bad, make some bytes out of that text.  Leave it up to the 
application.  Period, the end, it's not the library's job.  If you pushed the 
text out to a 'view message source' UI representation, then the vicissitudes of 
the system clipboard and other encoding and decoding things may corrupt it in 
inscrutable ways.  You can't fix it.  Don't try.

 So now we have both a bytes parser and a string parser.

Why do so many messages on this subject take this for granted?  It's wrong for 
the email module just like it's wrong for every other package.

There are plenty of other (better) ways to deal with this problem.  Let the 
application decide how to fudge the encoding of the characters back into bytes 
that can be parsed.  In the face of ambiguity, refuse the temptation to guess 
and all that.  The application has more of an idea of what's going on than the 
library here, so let it make encoding decisions.

Put another way, there's nothing wrong with having a text parser, as long as it 
just encodes the text according to some known encoding and then parses the 
bytes :).


 So, after much discussion, what we arrived at (so far!) is a model
 that mimics the Python3 split between bytes and strings.  If you
 start with bytes input, you end up with a BytesMessage object.
 If you start with string input to the parser, you end up with a
 StringMessage.

That may be a handy way to deal with some grotty internal implementation 
details, but having a 'decode()' method is broken.  The thing I care about, as 
a consumer of this API, is that there is a clearly defined Message interface, 
which gives me a uniform-looking place where I can ask for either characters 
(if I'm displaying them to the user) or bytes (if I'm putting them on the 
wire).  I don't particularly care where those bytes came from.  I don't care 
what decoding tricks were necessary to produce the characters.

Now, it may be worthwhile to have specific normalization / debrokenifying 
methods which deal with specific types of corrupt data from the wire; 
encoding-guessing, replacement-character insertion or whatever else are fine 
things to try.  It may also be helpful to keep around a list of errors in the 
message, for inspection.  But as we know, there are lots of ways that MIME data 
can go bad other than encoding, so that's just one variety of error that we 
might want to keep around.

(Looking at later messages as I'm about to post this, I think this all sounds 
pretty similar to Antoine's suggestions, with respect to keeping the 
implementation within a single class, and not having 
BytesMessage/UnicodeMessage at the same abstraction level.)___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread Glyph Lefkowitz


On Sep 16, 2010, at 7:34 PM, Barry Warsaw wrote:

 On Sep 16, 2010, at 06:11 PM, Glyph Lefkowitz wrote:
 
 That may be a handy way to deal with some grotty internal
 implementation details, but having a 'decode()' method is broken.  The
 thing I care about, as a consumer of this API, is that there is a
 clearly defined Message interface, which gives me a uniform-looking
 place where I can ask for either characters (if I'm displaying them to
 the user) or bytes (if I'm putting them on the wire).  I don't
 particularly care where those bytes came from.  I don't care what
 decoding tricks were necessary to produce the characters.
 
 But first you have to get to that Message interface.  This is why the current
 email package separates parsing and generating from the representation model.
 You could conceivably have a parser that rot13's all the payload, or just
 parses the headers and leaves the payload as a blob of bytes.  But the parser
 tries to be lenient in what it accepts, so that one bad header doesn't cause
 it to just punt on everything that follows.  Instead, it parses what it can
 and registers a defect on that header, which the application can then reason
 about, because it has a Message object.  If it were to just throw up its hands
 (i.e. raise an exception), you'd basically be left with a blob of useless crap
 that will just get /dev/null'd.

Oh, absolutely.  Please don't interpret anything I say as meaning that the 
email API should not handle broken data.  I'm just saying that you should not 
expect broken data to round-trip through translation to characters and back, 
any more than you should expect a broken PNG to round-trip through a 
translation to a 2d array of pixels and back.

 Now, it may be worthwhile to have specific normalization /
 debrokenifying methods which deal with specific types of corrupt data
 from the wire; encoding-guessing, replacement-character insertion or
 whatever else are fine things to try.  It may also be helpful to keep
 around a list of errors in the message, for inspection.  But as we
 know, there are lots of ways that MIME data can go bad other than
 encoding, so that's just one variety of error that we might want to
 keep around.
 
 Right.  The middle ground IMO is what the current parser does.  It recognizes
 the problem, registers a defect, and tries to recover, but it doesn't fix the
 corrupt data.  So for example, if you had a valid RFC 2047 encoded Subject but
 a broken X-Foo header, you'd at least still end up with a Message object.  The
 value of the good headers would be things from which you can get the unicode
 value, the raw bytes value, parse its parameters, munge it, etc. while the bad
 header might be something you can only get the raw bytes from.


My take on this would be that you should always be able to get bytes or 
characters, but characters are always suspect, in that once you've decoded, if 
you had invalid bytes, then they're replacement characters (or your choice of 
encoding fix).___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Garbage announcement printed on interpreter shutdown

2010-09-10 Thread Glyph Lefkowitz


On Sep 10, 2010, at 5:10 PM, Amaury Forgeot d'Arc wrote:

 2010/9/10 Fred Drake fdr...@acm.org:
 On Fri, Sep 10, 2010 at 4:32 PM, Georg Brandl g.bra...@gmx.net wrote:
 IMO this runs contrary to the decision we made when DeprecationWarnings were
 made silent by default: it spews messages not only at developers, but also 
 at
 users, who don't need it and probably are going to be quite confused by it,
 
 Agreed; this should be silent by default.
 
 +1. I suggest to enable it only when Py_DEBUG (or Py_TRACE_REFS or
 Py_REF_DEBUG?) is defined.

Would it be possible to treat it the same way as a deprecation warning, and 
show it under the same conditions?  It would be nice to know if my Python 
program is leaking uncollectable objects without rebuilding the interpreter.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Internal counter to debug leaking file descriptors

2010-08-31 Thread Glyph Lefkowitz


On Aug 31, 2010, at 10:03 AM, Guido van Rossum wrote:

 On Linux you can look somewhere in /proc, but I don't know that it
 would help you find where a file was opened.

/dev/fd is actually a somewhat portable way of getting this information.  I 
don't think it's part of a standard, but on Linux it's usually a symlink to 
/proc/self/fd, and it's available on MacOS and most BSDs (based on a hasty 
and completely-not-comprehensive investigation).  But it won't help you find 
out when the FDs were originally opened, no.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 'hasattr' is broken by design

2010-08-24 Thread Glyph Lefkowitz


On Aug 24, 2010, at 8:31 AM, Benjamin Peterson wrote:

 2010/8/24 Hrvoje Niksic hrvoje.nik...@avl.com:
 The __length_hint__ lookup expects either no exception or AttributeError,
 and will propagate others.  I'm not sure if this is a bug.  On the one hand,
 throwing anything except AttributeError from __getattr__ is bad style (which
 is why we fixed the bug by deriving our business exception from
 AttributeError), but the __length_hint__ check is supposed to be an internal
 optimization completely invisible to the caller of list().
 
 __length_hint__ is internal and undocumented, so it can do whatever it wants.

As it happens though, list() is _quite_ public.  Saying X is internal and 
undocumented, so it can do whatever it wants is never really realistic, 
especially in response to someone saying we already saw this problem in 
production, _without_ calling / referring to / knowing about this private API.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Fixing #7175: a standard location for Python config files

2010-08-12 Thread Glyph Lefkowitz


On Aug 12, 2010, at 6:30 AM, Tim Golden wrote:

 I don't care how many stats we're doing

You might not, but I certainly do.  And I can guarantee you that the authors of 
command-line tools that have to start up in under ten seconds, for example 
'bzr', care too.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 376 proposed changes for basic plugins support

2010-08-03 Thread Glyph Lefkowitz


On Aug 3, 2010, at 4:28 AM, M.-A. Lemburg wrote:

 I don't think that's a problem: the SQLite database would be a cache
 like e.g. a font cache or TCSH command cache, not a replacement of
 the meta files stored in directories.
 
 Such a database would solve many things at once: faster access to
 the meta-data of installed packages, fewer I/O calls during startup,
 more flexible ways of doing queries on the meta-data, needed for
 introspection and discovery, etc.

This is exactly what Twisted already does with its plugin cache, and the 
previously-cited ticket in this thread should expand the types of metadata 
which can be obtained about plugins.

Packaging systems are perfectly capable of generating and updating such 
metadata caches, but various packages of Twisted (Debian's especially) didn't 
read our documentation and kept moving around the place where Python source 
files were installed, which routinely broke the post-installation hooks and 
caused all kinds of problems.

I would strongly recommend looping in the Python packaging teams from various 
distros *before* adding another such cache, unless you want to be fielding bugs 
from Launchpad.net for five years :).

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 376 proposed changes for basic plugins support

2010-08-02 Thread Glyph Lefkowitz


On Aug 2, 2010, at 9:53 AM, exar...@twistedmatrix.com wrote:

 On 01:27 pm, m...@egenix.com wrote:
 exar...@twistedmatrix.com wrote:
 On 12:21 pm, m...@egenix.com wrote:
 
 See Zope for an example of how well this simply mechanism works out in
 practice: it simply scans the Products namespace for sub-packages and
 then loads each sub-package it finds to have it register itself with
 Zope.
 
 This is also roughly how Twisted's plugin system works.  One drawback,
 though, is that it means potentially executing a large amount of Python
 in order to load plugins.  This can build up to a significant
 performance issue as more and more plugins are installed.
 
 I'd say that it's up to the application to deal with this problem.
 
 An application which requires lots and lots of plugins could
 define a registration protocol that does not require loading
 all plugins at scanning time.
 
 It's not fixable at the application level, at least in Twisted's plugin 
 system.  It sounds like Zope's system has the same problem, but all I know of 
 that system is what you wrote above.  The cost increases with the number of 
 plugins installed on the system, not the number of plugins the application 
 wants to load.

We do have a plan to address this in Twisted's plugin system (eventually): 
http://twistedmatrix.com/trac/ticket/3773, although I'm not sure if that's 
relevant to the issue at hand.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] proto-pep: plugin proposal (for unittest)

2010-08-01 Thread Glyph Lefkowitz


On Aug 1, 2010, at 3:52 PM, Ronald Oussoren wrote:

 
 On 1 Aug, 2010, at 17:22, Éric Araujo wrote:
 
 Speaking of which... Your documentation says it's named ~/unittest.cfg,
 could you make this a file in the user base (that is, the prefix where
 'setup.py install --user' will install files)?
 
 Putting .pydistutils.cfg .pypirc .unittest2.cfg .idlerc and possibly
 other in the user home directory (or %APPDATA% on win32 and
 what-have-you on Mac) is unnecessary clutter. However, $PYTHONUSERBASE
 is not the right directory for configuration files, as pointed in
 http://bugs.python.org/issue7175
 
 It would be nice to agree on a ~/.python (resp. %APPADATA%/Python) or
 $XDG_CONFIG_HOME/python directory and put config files there.
 
 ~/Library/Python would be a good location on OSX, even if the 100% formally 
 correct location would be ~/Preferences/Python (at least of framework builds, 
 unix-style builds may want to follow the unix convention).

100% formally speaking, MacOS behaves like UNIX in many ways.  
http://en.wikipedia.org/wiki/Single_UNIX_Specification#Mac_OS_X_and_Mac_OS_X_Server

It's fine to have a mac-pathname-convention-following place for such data, but 
please _also_ respect the UNIX-y version on the Mac.  The only possible outcome 
of python on the Mac respect only Mac pathnames is to have automation scripts 
that work fine on BSD and Linux, but then break when you try to run them on a 
Mac.  There is really no benefit to intentionally avoiding honoring the UNIX 
conventions.  (For another example, note that although Python resides in 
/System/Library, on the mac, the thing that's in your $PATH when you're using a 
terminal is the symlink in /usr/bin/python.)

Also, no, ~/Preferences isn't the right place for it either; there's no such 
thing.  You probably meant ~/Library/Preferences.  I'd say that since 
~/Library/Python is already used, there's no particular reason to add a new 
~/Library/Preferences/Python location.  After all, if you really care a lot 
about platform conventions, you should put it in 
~/Library/Preferences/org.python.distutils.plist, but I don't see what benefit 
that extra complexity would have for anyone.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python signal processing question

2010-07-22 Thread Glyph Lefkowitz

On Jul 22, 2010, at 12:00 AM, Stephen J. Turnbull wrote:

 My understanding of OSError is that the OS is saying sorry, what you
 tried to do is perfectly reasonable under some circumstances, but you
 can't do that now.  ENOMEM, EPERM, ENOENT etc fit this model.
 
 RuntimeError OTOH is basically saying You should know better than to
 try that!  EINVAL fits this model.


That is not my understanding of OSError at all, especially given that I have 
seen plenty of OSErrors that have EINVAL set by various things.

OSError's docstring specifically says OS system call failed., and that's the 
way I've already understood it: you made a syscall and got some kind of error.  
Python _mostly_ avoids classifying OSErrors into different exception types in 
other APIs.

The selection of RuntimeError in this particular case seems somewhat random and 
ad-hoc, given that out-of-range signal values give ValueError while SIGKILL and 
SIGSTOP give RuntimeError.  The RuntimeError's args start with 22 (which I 
assume is supposed to mean EINVAL) but it doesn't have an 'errno' attribute 
as an OSError would.  The ValueError doesn't relate to an errno at all.  
Nowhere does the documentation say raises OSError or ValueError or TypeError 
or RuntimeError whose args[0] may be an errno.

To be clear, this particular area doesn't bother me.  I've been dealing with 
weird and puzzling signal-handling issues in Python for years and years and 
this dusty corner of the code has never come up.  I did want to reply to this 
particular message, though, because I *would* eventually like the exception 
hierarchy raised by certain stdlib functions to be more thoroughly documented 
and coherent, but a prerequisite to that is to avoid rationalizing the random 
potpourri of exception types that certain parts of the stdlib emit.  I think 
signal.signal is one such part.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] What to do with languishing patches?

2010-07-18 Thread Glyph Lefkowitz


On Jul 18, 2010, at 1:46 PM, Alexander Belopolsky wrote:

 We already have posponed and remind resolutions, but these are
 exclusive of accepted.   I think there should be a clear way to mark
 the issue accepted and would be applied if X.Y was out already.
 Chances are one of the resolution labels already has such meaning, but
 in this case it should be more prominently documented as such.

This is what branches are for.

When the X.Y release cycle starts, there should be a branch for X.Y.  Any 
would be applied patches can simply be applied to trunk without interrupting 
anything; the X.Y release branch can be merged back into trunk as necessary.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] avoiding accidental shadowing of top-level libraries by the main module

2010-07-13 Thread Glyph Lefkowitz

On Jul 13, 2010, at 5:02 PM, Nick Coghlan wrote:

 My concerns aren't about a module reimporting itself directly, they're
 about the case where a utility module is invoked as __main__ but is
 also imported normally somewhere else in a program (e.g. pdb is
 invoked as a top-level debugger, but is also imported directly for
 some reason). Currently that works as a non-circular import and will
 only cause hassles if there is top-level state in the affected module
 that absolutely must be a singleton within a given application. Either
 change (disallowing it completely as you suggest, or making it a
 circular import, as I suggest) runs the risk of breaking code that
 currently appears to work correctly.
 
 Fred's point about the practice of changing __name__ in the main
 module corrupting generated pickles is one I hadn't thought of before
 though.

It's not just pickle; anything that requires __name__ (or __module__) to be 
accurate for introspection or debugging is also problematic.

I have long considered it a 'best practice' (ugh, I hate that phrase, but I 
can't think of what else to call it) to _always_ do this type of shadowing, and 
avoid defining _any_ names in the __name__ == '__main__' case, so that there's 
no ambiguity:

http://glyf.livejournal.com/60326.html

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Removing IDLE from the standard library

2010-07-12 Thread Glyph Lefkowitz


On Jul 12, 2010, at 4:34 AM, Éric Araujo wrote:

 Plus, http://twistedmatrix.com/trac/report/15 is a useful resource
 for core developers with only a little bit of free time to do a
 review.
 
 Title: “Review Tickets, By Order You Should Review Them In”
 I haven’t found a description of this order, can you explain? Thanks.

Part of the reason that the report is worded that way is that we may decide 
that the order should be different, but it will still be the order that you 
should review them in :).

Right now the order is amount of time since last change, sorted from highest 
to lowest.  In other words, first come, first serve, by last activity.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Idle-dev] Removing IDLE from the standard library

2010-07-12 Thread Glyph Lefkowitz


On Jul 12, 2010, at 11:36 AM, Reid Kleckner wrote:

 (Somwhat off-topic):  Another pain point students had was accidentally
 shadowing stdlib modules, like random.  Renaming the file didn't solve
 the problem either, because it left behind .pycs, which I had to help
 them delete.

I feel your pain.  It seems like every third person who starts playing with 
Twisted starts off by making a file called 'twisted.py' and then getting really 
confused by the behavior.  I would love it if this could be fixed, but I 
haven't yet thought of a solution that would be less confusing than the problem 
itself.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Idle-dev] Removing IDLE from the standard library

2010-07-12 Thread Glyph Lefkowitz


On Jul 12, 2010, at 5:47 PM, Fred Drake wrote:

 On Mon, Jul 12, 2010 at 5:42 PM, Michael Foord
 fuzzy...@voidspace.org.uk wrote:
 I'm sure Brett will love this idea, but if it was impossible to reimport the
 script being executed as __main__ with a different name it would solve these
 problems.
 
 Indeed!  And I'd be quite content with such a solution, since I
 consider scripts and modules to be distinct.

but ... isn't the whole point of 'python -m' to make scripts and modules _not_ 
be distinct?___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Idle-dev] Removing IDLE from the standard library

2010-07-11 Thread Glyph Lefkowitz


On Jul 11, 2010, at 10:22 AM, Tal Einat wrote:

 Most of the responses up to this point have been strongly against my
 proposal. The main reason given is that it is nice to have a graphical
 IDE supported out-of-the-box with almost any Python installation. This
 is especially important for novice programmers and in teaching
 environments. I understand this sentiment, but I think that supplying
 a quirky IDE with many caveats, lacking documentation, some bugs and a
 partially working debugger ends up causing more confusion than good.

The people who are actually *in* those environments seem to disagree with you 
:).  I think you underestimate the difficulty of getting software installed and 
overestimate the demands of new Python users and students.

While I don't ever use IDLE if there's an alternative available, I have been 
very grateful many times for its presence in environments where it was a 
struggle even to say install Python.  A workable editor and graphical shell 
is important, whatever its flaws.  (And I think you exaggerate IDLE's flaws 
just a bit.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Removing IDLE from the standard library

2010-07-11 Thread Glyph Lefkowitz


On Jul 11, 2010, at 2:37 PM, Martin v. Löwis wrote:

 Initially (five years ago!) I tried to overcome these issues by
 improving IDLE, solving problems and adding a few key features.
 Without going into details, suffice to say that IDLE hasn't improved
 much since 2005 despite my efforts. For example, see
 http://bugs.python.org/issue1529142, where it took nearly 3 years to
 fix a major issue from the moment I posted the first workaround. For
 another example, see http://bugs.python.org/issue3068, where I posted
 a patch for an extension configuration dialog over two years ago, and
 it hasn't received as much as a sneeze in response.
 
 I can understand that this is frustrating, but please understand that
 this is not specific to your patches, or to IDLE. Many other patches on
 bugs.python.org remain unreviewed for many years. That's because many of
 the issues are really tricky, and there are very few people who both
 have the time and the expertise to evaluate them.

This problem seems to me to be the root cause here.

Guido proposes to give someone interested in IDLE commit access, and hopefully 
that will help in this particular area.  But, as I recall, at the last language 
summit there was quite a bit of discussion about how to address the broader 
issue of patches falling into a black hole.  Is anybody working on it?

(This seems to me like an area where a judicious application of PSF funds might 
help; if every single bug were actively triaged and responded to, even if it 
weren't reviewed, and patch contributors were directed to take specific steps 
to elicit a response or a review, the fact that patch reviews take a while 
might not be so bad.)

 FWIW, I don't consider a few months as a long time for a patch review.

It may not be a long time compared to other patch reviews, but it is a very 
long time for a volunteer to wait for something, especially if that something 
is any indication that the python developers care that this patch was 
submitted at all.

There seems to be at least one thread a month on this list from a disgruntled 
community member complaining (directly or indirectly) about this delay.  I 
think that makes it a big problem.

 At the moment, I'm personally able to perhaps review one issue per week
 (sometimes less); at this rate, it'll take several years until I get
 to everything.


I guess it depends what you mean by everything, but given that the open bug 
count is actually increasing at a significant rate, I would say that you can 
never possibly get to everything.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Removing IDLE from the standard library

2010-07-11 Thread Glyph Lefkowitz

On Jul 11, 2010, at 3:19 PM, Martin v. Löwis wrote:

 Unfortunately, it's often not clear what the submitter wants: does she
 want to help, or want to get help? For a bug report, I often post a
 message can you provide a patch?, but sometimes, it isn't that clear.

Perhaps this is the one area where the biggest advance could be made: a 
clarification of the workflow.

My experience with Python issues which have been triaged is that everyone who 
triages tickets has a slightly different idea of who is responsible for the 
ticket and what they're supposed to do next at every point in the process.  
Triage, as described on http://www.python.org/dev/workflow/, emphasizes 
making sure that all fields in the issue tracker are properly set, rather 
than on communicating with the contributor or reporter.

On Twisted, we try to encourage triagers to focus on communicating the workflow 
ramifications of what a particular contributor has done.  We try to provide a 
response to the bug reporter or patch submitter that says thanks, but in order 
to move this along, you need to go through the following steps and sometimes 
even attach a link to the workflow document pointing out exactly where in the 
process the ticket is now stuck.  (At least, that's what we're trying to do.)

This involves a lot of repeating ourselves in ticket comments, but it's well 
worth it (and as more of the repetition moves into citing links to documents 
that have been written to describe aspects of the workflow, it's less onerous).

http://www.python.org/dev/workflow/ describes what the steps are, but it's in 
a sort of procedural passive voice that doesn't say who is responsible for 
doing reviews or how to get a list of patches which need to be reviewed or what 
exactly a third-party non-core-committer reviewer should do to remove the 
'Patch review' keyword.

http://twistedmatrix.com/trac/wiki/TwistedDevelopment#SubmittingaPatch and 
http://twistedmatrix.com/trac/wiki/ReviewProcess meander around a bit, but a 
while ago we re-worked them so that each section has a specific audience 
(authors, reviewers, or external patch submitters) and that helped readers 
understand what they're intended to do.

Plus, http://twistedmatrix.com/trac/report/15 is a useful resource for core 
developers with only a little bit of free time to do a review.

(I'm just offering some suggestions based on what I think has worked, not to 
hold Twisted up as a paragon of a perfect streamlined process.  We still have 
folks complain about stuck patches, these documents are _far_ from perfect, and 
there are still some varying opinions about how certain workflow problems 
should be dealt with and differences in quality of review.  Plus, we have far 
fewer patches to deal with than Python.  Nevertheless, the situation used to be 
worse for us, and these measures seem to have helped.)___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Removing IDLE from the standard library

2010-07-11 Thread Glyph Lefkowitz

On Jul 11, 2010, at 5:33 PM, Georg Brandl wrote:

 Honestly, how would you feel as a committer to have scores of issues assigned
 to you -- as a consequence of speedy triage -- knowing that you have to invest
 potentially hours of volunteer time into them, while the person doing the
 triaging is done with the bug in a few minutes and paid for it?  I'd feel a
 little bit duped.

That doesn't strike me as a particularly useful type of triage.

The most useful type of triage in this case would be the kind where the bug 
gets re-assigned to the *original contributor*, not a core committer, with a 
message clearly saying thanks!  but we will not do anything further with this 
ticket until *you* do XYZ.  This may result in some tickets getting left by 
wayside, but at least it will be clear that they have been left by the wayside, 
and whose responsibility they really are.

Even so, I would certainly feel better having scores of issues assigned to me 
than I would feel having scores of issues that are just hanging out in limbo 
forever.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Licensing // PSF // Motion of non-confidence

2010-07-06 Thread Glyph Lefkowitz


On Jul 6, 2010, at 8:09 AM, Steven D'Aprano wrote:

 You've never used Apple's much-missed Hypertalk, have you? :)

on mailingListMessage
get the message
put it into aMessage
if the thread of aMessage contains license wankery
put aMessage into the trash
end if
end mailingListMessage

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Can Python implementations reject semantically invalid expressions?

2010-07-01 Thread Glyph Lefkowitz

On Jul 2, 2010, at 12:28 AM, Steven D'Aprano wrote:

 This question was inspired by something asked on #python today. Consider 
 it a hypothetical, not a serious proposal.
 
 We know that many semantic errors in Python lead to runtime errors, e.g. 
 1 + 1. If an implementation rejected them at compile time, would it 
 still be Python? E.g. if the keyhole optimizer raised SyntaxError (or 
 some other exception) on seeing this:
 
 def f():
return 1 + 1
 
 instead of compiling something which can't fail to raise an exception, 
 would that still be a legal Python implementation?

I'd say no.  Python has defined semantics in this situation: a TypeError is 
raised.

To me, this seems akin to a keyhole optimizer arbitrarily deciding that

raise TypeError()

should cause the compiler to abort.

If this type of expression were common, it would be within the rights of, for 
example, a Python JIT to generate a fast path through 'f' that wouldn't bother 
to actually invoke its 'int' type's '__add__' method, since there is no 
possible way for a Python program to tell the difference, since int.__add__ is 
immutable.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Glyph Lefkowitz


On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote:

 Regarding the proposal of a String ABC, I hope this isn't going to
 become a backdoor to reintroduce the Python 2 madness of allowing
 equivalency between text and bytes for *some* strings of bytes and not
 others.

For my part, what I want out of a string ABC is simply the ability to do 
application-specific optimizations.

There are many applications where all input and output is text, but _must_ be 
UTF-8.  Even GTK uses UTF-8 as its native text representation, so output 
could just be display.

Right now, in Python 3, the only way to be correct about this is to copy 
every byte of input into 4 bytes of output, then copy each code point *back* 
into a single byte of output.  If all your application does is rewrite the 
occasional XML attribute, for example, this cost can be significant, if not 
overwhelming.

I'd like a version of 'decode' which would give me a type that was, in every 
respect, unicode, and responded to all protocols exactly as other unicode 
objects (or str objects, if you prefer py3 nomenclature ;-)) do, but wouldn't 
actually copy any of that memory unless it really needed to (for example, to 
pass to a C API that expected native wide characters), and that would hold on 
to the original bytes so that it could produce them on demand if encoded to the 
same encoding again. So, as others in this thread have mentioned, the 'ABC' 
really implies some stuff about C APIs as well.

I'm not sure about the exact performance impact of such a class, which is why 
I'd like the ability to implement it *outside* of the stdlib and see how it 
works on a project, and return with a proposal along with some data.  There are 
also different ways to implement this, and other optimizations (like ropes) 
which might be better.

You can almost do this today, but the lack of things like the hypothetical 
__rcontains__ does make it impossible to be totally transparent about it.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Glyph Lefkowitz


On Jun 22, 2010, at 8:57 PM, Robert Collins wrote:

 bzr has a cache of decoded strings in it precisely because decode is
 slow. We accept slowness encoding to the users locale because thats
 typically much less data to examine than we've examined while
 generating the commit/diff/whatever. We also face memory pressure on a
 regular basis, and that has been, at least partly, due to UCS4 - our
 translation cache helps there because we have less duplicate UCS4
 strings.

Thanks for setting the record straight - apologies if I missed this earlier in 
the thread.  It does seem vaguely familiar.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-23 Thread Glyph Lefkowitz


On Jun 23, 2010, at 8:17 AM, Steve Holden wrote:

 Guido van Rossum wrote:
 On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver tsea...@palladion.com wrote:
 Any turdiness (which I am *not* arguing for) is a natural consequence
 of the kinds of backward incompatibilities which were *not* ruled out
 for Python 3, along with the (early, now waning) build it and they will
 come optimism about adoption rates.
 
 FWIW, my optimisim is *not* waning. I think it's good that we're
 having this discussion and I expect something useful will come out of
 it; I also expect in general that the (admittedly serious) problem of
 having to port all dependencies will be solved in the next few years.
 Not by magic, but because many people are taking small steps in the
 right direction, and there will be light eventually. In the mean time
 I don't blame anyone for sticking with 2.x or being too busy to help
 port stuff to 3.x. Python 3 has been a long time in the making -- it
 will be a bit longer still, which was expected.
 
 +1
 
 The important thing is to avoid bigotry and FUD, and deal with things
 the way they are. The #python IRC team have just helped us make a major
 step forward. This won't be a campaign with a victorious charge over
 some imaginary finish line.

For sure.

I don't speak for Tres, but I don't think he wasn't talking about optimism 
about *adoption*, overall, but optimism about adoption *rates*.  And I don't 
think he was talking about it coming from Guido :).

There has definitely been some irrational exuberance from some quarters.  The 
form it usually takes is someone making a blog post which assumes, because the 
author could port their smallish library or application without too much 
hassle, that Python 2.x is already dead and everyone should be off of it in a 
couple of weeks.

I've never heard this position from the core team or any official communication 
or documentation.  Far from it: the realistic attitude that the Python 3 
migration is something that will take a while has significantly reduced my own 
concerns.

Even the aforementioned blog posts have been encouraging in some ways, because 
a lot of people are reporting surprisingly easy transitions.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Glyph Lefkowitz

On Jun 21, 2010, at 10:58 PM, Stephen J. Turnbull wrote:

 The RFC says that URIs are text, and therefore they can (and IMO
 should) be operated on as text in the stdlib.


No, *blue* is the best color for a shed.

Oops, wait, let me try that again.

While I broadly agree with this statement, it is really an oversimplification.  
An URI is a structured object, with many different parts, which are transformed 
from bytes to ASCII (or something latin1-ish, which is really just bytes with a 
nice face on them) to real, honest-to-goodness text via the IRI specification: 
http://tools.ietf.org/html/rfc3987.

 Note also that the complete solution argument cuts both ways.  Eg, a
 complete solution should implement UTS 39 confusables detection[1]
 and IDNA[2].  Good luck doing that with bytes!

And good luck doing that with just characters, too.  You need a parsed 
representation of the URI that you can encode different parts of in different 
ways.  (My understanding is that you should only really implement confusables 
detection in the netloc... while that may be a bogus example, you're certainly 
only supposed to do IDNA in the netloc!)

You can just call urlsplit() all over the place to emulate this, but this does 
not give you the ability to go back to the original bytes, and thereby preserve 
things like brokenly-encoded segments, which seems to be what a lot of this 
hand-wringing is about.

To put it another way, there is no possible information-preserving string or 
bytes type that will make everyone happy as a result from urljoin().  The only 
return-type that gives you *everything* is URI.

 just using 'latin-1' as the encoding allows you to
 use the (unicode) string operations internally, and then spew your
 mess out into the world for someone else to clean up, just as using
 bytes would.

This is the limitation that everyone seems to keep dancing around.  If you are 
using the stdlib, with functions that operate on sequences like 'str' or 
'bytes', you need to choose from one of three options:

  1. decode everything to latin1 (although I prefer to call it charmap when 
used in this way) so that you can have some mojibake that will fool a function 
that needs a unicode object, but not lose any information about your input so 
that it can be transformed back into exact bytes (and be very careful to never 
pass it somewhere that it will interact with real text!),
  2. actually decode things to an appropriate encoding to be displayed to the 
user and manipulated with proper text-manipulation tools, and throw away 
information about the bytes,
  3. keep both the bytes and the characters together (perhaps in a data 
structure) so that you can both display the data and encode it in 
situationally-appropriate ways.

The stdlib as it is today is not going to handle the 3rd case for anyone.  I 
think that's fine; it is not the stdlib's job to solve everyone's problems.  
I've been happy with it providing correctly-functioning pieces that can be used 
to build more elaborate solutions.  This is what I meant when I said I agree 
with Stephen's first point: the stdlib *should* just keep operating entirely on 
strings, because URIs are defined, by the spec, to be sequences of ASCII 
characters.  But that's not the whole story.

PJE's bstr and ebytes proposals set my teeth on edge.  I can totally 
understand the motivation for them, but I think it would be a big step 
backwards for python 3 to succumb to that temptation, even in the form of a 
third-party library.  It is really trying to cram more information into a pile 
of bytes than truly exists there.  (Also, if we're going to have encodings 
attached to bytes objects, I would very much like to add JPEG and FLAC to 
the list of possibilities.)

The real tension there is that WSGI is desperately trying to avoid defining any 
data structures (i.e. classes), while still trying to work with structured 
data.  An URI class with a 'child' method could handily solve this problem.  
You could happily call IRI(...).join(some bytes).join(some text) and then just 
say give me some bytes, it's time to put this on the network, or give me 
some characters, I have to show something to the user, or even give me some 
characters appropriate for an 'href=' target in some HTML I'm generating - 
although that last one could be left to the HTML generator, provided it could 
get enough information from the URI/IRI object's various parts itself.

I don't mean to pick on WSGI, either.  This is a common pain-point for porting 
software to 3.x - you had a string, it kinda worked most of the time before, 
but now you need to keep track of text too and the functions which seemed to 
work on bytes no longer do.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Glyph Lefkowitz


On Jun 22, 2010, at 12:53 PM, Guido van Rossum wrote:

 On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger
 raymond.hettin...@gmail.com wrote:
 
 On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote:
 
 This is a common pain-point for porting software to 3.x - you had a
 string, it kinda worked most of the time before, but now you need to keep
 track of text too and the functions which seemed to work on bytes no longer
 do.
 
 Thanks Glyph.  That is a nice summary of one kind of challenge facing
 programmers.
 
 Ironically, Glyph also described the pain in 2.x: it only kinda worked.

It was not my intention to be ironic about it - that was exactly what I meant 
:).  3.x is forcing you to confront an issue that you _should_ have confronted 
for 2.x anyway. 

(And, I hope, most libraries doing a 3.x migration will take the opportunity to 
make their 2.x APIs unicode-clean while still in 2to3 mode, and jump ship to 
3.x source only _after_ there's a nice transition path for their clients that 
can be taken in 2 steps.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Glyph Lefkowitz


On Jun 22, 2010, at 2:07 PM, James Y Knight wrote:

 Yeah. This is a real issue I have with the direction Python3 went: it pushes 
 you into decoding everything to unicode early, even when you don't care -- 
 all you really wanted to do is pass it from one API to another, with some 
 well-defined transformations, which don't actually depend on it having being 
 decoded properly. (For example, extracting the path from the URL and 
 attempting to open it as a file on the filesystem.)

But you _do_ need to decode it in this case.  If you got your URL from some 
funky UTF-32 datasource, b\x00\x00\x00/ is not a path separator, / is.  
Plus, you should really be separating path segments and looking at them 
individually so that you don't fall victim to %2F bugs.  And if you want your 
code to be portable, you need a Unicode representation of your pathname anyway 
for Windows; plus, there, you need to care about \ as well as /.

The fact that your wire-bytes were probably ASCII(-ish) and your filesystem 
probably encodes pathnames as UTF-8 and so everything looks like it lines up is 
no excuse not to be explicit about your expectations there.

You may want to transcode your characters into some other characters later, but 
that shouldn't stop you from treating them as characters of some variety in the 
meanwhile.

 The surrogateescape method is a nice workaround for this, but I can't help 
 thinking that it might've been better to just treat stuff as 
 possibly-invalid-but-probably-utf8 byte-strings from input, through 
 processing, to output. It seems kinda too late for that, though: next time 
 someone designs a language, they can try that. :)

I can think of lots of optimizations that might be interesting for Python (or 
perhaps some other runtime less concerned with cleverness overload, like PyPy) 
to implement, like a UTF-8 combining-characters overlay that would allow for 
fast indexing, lazily populated as random access dictates.  But this could all 
be implemented as smartness inside .encode() and .decode() and the str and 
bytes types without changing the way the API works.

I realize that there are implications at the C level, but as long as you can 
squeeze a function call in to certain places, it could still work.

I can also appreciate what's been said in this thread a bunch of times: to my 
knowledge, nobody has actually shown a profile of an application where encoding 
is significant overhead.  I believe that encoding _will_ be a significant 
overhead for some applications (and actually I think it will be very 
significant for some applications that I work on), but optimizations should 
really be implemented once that's been demonstrated, so that there's a better 
understanding of what the overhead is, exactly.  Is memory a big deal?  Is CPU? 
 Is it both?  Do you want to tune for the tradeoff?  etc, etc.  Clever 
data-structures seem premature until someone has a good idea of all those 
things.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Glyph Lefkowitz


On Jun 22, 2010, at 7:23 PM, Ian Bicking wrote:

 This is a place where bytes+encoding might also have some benefit.  XML is 
 someplace where you might load a bunch of data but only touch a little bit of 
 it, and the amount of data is frequently large enough that the efficiencies 
 are important.

Different encodings have different characteristics, though, which makes them 
amenable to different types of optimizations.  If you've got an ASCII string or 
a latin1 string, the optimizations of unicode are pretty obvious; if you've got 
one in UTF-16 with no multi-code-unit sequences, you could also hypothetically 
cheat for a while if you're on a UCS4 build of Python.

I suspect the practical problem here is that there's no CharacterString ABC in 
the collections module for third-party libraries to provide their own 
peculiarly-optimized implementations that could lazily turn into real 'str's as 
needed.  I'd volunteer to write a PEP if I thought I could actually get it done 
:-\.  If someone else wants to be the primary author though, I'll try to help 
out.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

2010-06-21 Thread Glyph Lefkowitz

On Jun 21, 2010, at 2:17 PM, P.J. Eby wrote:

 One issue I remember from my enterprise days is some of the Asian-language 
 developers at NTT/Verio explaining to me that unicode doesn't actually solve 
 certain issues -- that there are use cases where you really *do* need bytes 
 plus encoding in order to properly express something.

The thing that I have heard in passing from a couple of folks with experience 
in this area is that some older software in asia would present characters 
differently if they were originally encoded in a japanese encoding versus a 
chinese encoding, even though they were really the same characters.

I do know that Han Unification is a giant political mess 
(http://en.wikipedia.org/wiki/Han_unification makes for some interesting 
reading), but my understanding is that it has handled enough of the cases by 
now that one can write software to display asian languages and it will 
basically work with a modern version of unicode.  (And of course, there's 
always the private use area, as Stephen Turnbull pointed out.)

Regardless, this is another example where keeping around a string isn't really 
enough.  If you need to display a japanese character in a distinct way because 
you are operating in the japanese *script*, you need a tag surrounding your 
data that is a hint to its presentation.  The fact that these presentation 
hints were sometimes determined by their encoding is an unfortunate historical 
accident.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-19 Thread Glyph Lefkowitz

On Jun 19, 2010, at 5:02 PM, Terry Reedy wrote:

 HoweverI have very little experience with IRC and consequently have little 
 idea what getting a permanent, owned, channel like #python entails. Hence the 
 '?' that follows.
 
 What do others think?

Sure, this is a good idea.

Technically speaking, this is extremely easy.  Somebody needs to /msg chanserv 
register #python3 and that's about it.  (In this case, that someone may need 
to be Brett Cannon, since he is the official group contact for Freenode 
regarding Python-related channels.)

Practically speaking, you will need a group of at least a dozen contributors, 
each in a different timezone, who sit there all day answering questions :).  
Otherwise the ownership of the channel is just a signpost pointing at an empty 
room.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-19 Thread Glyph Lefkowitz

On Jun 19, 2010, at 5:39 PM, geremy condra wrote:

 Bottom line, what I'd really like to do is kick them all off of #python, but
 practically I see very little that can be done to rectify the situation at 
 this
 point.

Here's something you can do: port libraries to python 3 and make the ecosystem 
viable.

It's as simple as that.  Nobody on #python has an ideological axe to grind, 
they just want to tell users to use tools which actually solve their problems.  
(Well, unless you think that helping users is ideological axe-grinding, in 
which case I think you may want to re-examine your own premises.)

If Python 3 had all the features and libraries as Python 2, and ran in all the 
same places (for example, as Stephen Thorne reminded me when I asked him about 
this, the oldest supported version of Red Hat Enterprise Linux...) then it 
would be an equally viable answer on IRC.  It's going to take a lot of work to 
get it to that point.

Even if you write code, of course, it's too much work for one person to fill 
the whole gap.  Have some patience.  The PSF is funding these efforts, and more 
library authors are porting all the time.  Eventually, resistance in forums 
like Freenode's #python will disappear.  But you can't make it go away by 
wishing it away, you have to get rid of the cause.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3148 ready for pronouncement

2010-05-26 Thread Glyph Lefkowitz


On May 24, 2010, at 5:36 AM, Brian Quinlan wrote:
 On May 24, 2010, at 5:16 AM, Glyph Lefkowitz wrote:
 On May 23, 2010, at 2:37 AM, Brian Quinlan wrote:
 On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote:

 ProcessPoolExecutor has the same serialization perils that multiprocessing 
 does. My original plan was to link to the multiprocessing docs to explain 
 them but I couldn't find them listed.

Linking to the pickle documentation might be a good start.

 Yes, the execution context is Executor-dependent. The section under 
 ProcessPoolExecutor and ThreadPoolExecutor spells this out, I think.

I suppose so.  I guess I'm just looking for more precise usage of terminology. 
(This is a PEP, after all.  It's a specification that multiple VMs may have to 
follow, not just some user documentation for a package, even if they'll 
*probably* be using your code in all cases.)  I'd be happier if there were a 
clearer term than calls for the things being scheduled (submissions?), 
since the done callbacks aren't called in the subprocess for 
ProcessPoolExecutor, as we just discussed.

 Sure.  Really, almost any contract would work, it just needs to be spelled 
 out.  It might be nice to know whether the thread invoking the callbacks is 
 a daemon thread or not, but I suppose it's not strictly necessary.
 
 Your concerns is that the thread will be killed when the interpreter exits? 
 It won't be.

Good to know.  Tell it to the PEP though, not me ;).

 No reaction on [invoker vs. future]?  I think you'll wish you did this in a 
 couple of years when you start bumping into application code that calls 
 set_result :).
 
 My reactions are mixed ;-)

Well, you are not obliged to take my advice, as long as I am not obliged to 
refrain from mocking you mercilessly if it happens that I was right in a couple 
of years ;-).

 Your proposal is to add a level of indirection to make it harder for people 
 to call implementation methods. The downside is that it makes it a bit harder 
 to write tests and Executors.

Both tests and executors will still create and invoke methods directly on one 
object; the only additional difficulty seems to be the need to type '.future' 
every so often on the executor/testing side of things, and that seems a cost 
well worth paying to avoid confusion over who is allowed to call those methods 
and when.

 I also can't see a big problem in letting people call set_result in client 
 code though it is documented as being only for Executor implementations and 
 tests. 
 
 On the implementation side, I don't see why an Invoker needs a reference to 
 the future.

Well, uh...

 class Invoker(object):
   def __init__(self):
 Should only be called by Executor implementations.
 self.future = Future()
 ^ this is what I'd call a reference to the future

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3148 ready for pronouncement

2010-05-26 Thread Glyph Lefkowitz

On May 26, 2010, at 3:37 AM, Paul Moore wrote:

 On 26 May 2010 08:11, Lennart Regebro rege...@gmail.com wrote:
 On Wed, May 26, 2010 at 06:22, Nick Coghlan ncogh...@gmail.com wrote:
 - download a futures module from PyPI and live with the additional
 dependency
 
 Why would that be a problem?
 
 That has been hashed out repeatedly on this and other lists. Can it
 please be stipulated that for *some* people, in *some* cases, it is a
 problem?

Sure, but I for one fully support Lennart asking the question, because while in 
the short term this *is* a problem with packaging tools in the Python 
ecosystem, in the long term (as you do note) it's an organizational dysfunction 
that can be addressed with better tools.

I think it would be bad to ever concede the point that sane factoring of 
dependencies and code re-use aren't worth it because some jerk in Accounting or 
System Operations wants you to fill out a requisition form for a software 
component that's free and liberally licensed anyway.

To support the unfortunate reality that such jerks in such departments really 
do in fact exist, there should be simple tools to glom a set of small, nicely 
factored dependencies into a giant monolithic ball of crud that installs all at 
once, and slap a sticker on the side of it that says I am only filling out 
your stupid form once, okay.  This should be as distant as possible from the 
actual decision to package things in sensibly-sized chunks.

In other words, while I kinda-sorta buy Brian's argument that having this 
module in easy reach will motivate more people to use a standard, tested idiom 
for parallelization, I *don't* think that the stdlib should be expanded simply 
to accommodate those who just don't want to install additional packages for 
anything.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3148 ready for pronouncement

2010-05-26 Thread Glyph Lefkowitz


On May 26, 2010, at 4:55 AM, Brian Quinlan wrote:

 I said exactly the opposite of what I meant: futures don't need a reference 
 to the invoker.

Indeed they don't, and they really shouldn't have one.  If I wrote that they 
did, then it was an error.

... and that appears to be it!  Thank you for your very gracious handling of a 
pretty huge pile of criticism :).

Good luck with the PEP,

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3148 ready for pronouncement

2010-05-23 Thread Glyph Lefkowitz


On May 23, 2010, at 2:37 AM, Brian Quinlan wrote:

 On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote:
 
 
 On May 22, 2010, at 8:47 PM, Brian Quinlan wrote:
 
 Jesse, the designated pronouncer for this PEP, has decided to keep 
 discussion open for a few more days.
 
 So fire away!
 
 As you wish!
 
 I retract my request ;-)

May you get what you wish for, may you find what you are seeking :).

 The PEP should be consistent in its usage of terminology about callables.  
 It alternately calls them callables, functions, and functions or 
 methods.  It would be nice to clean this up and be consistent about what 
 can be called where.  I personally like callables.
 
 Did you find the terminology confusing? If not then I propose not changing it.

Yes, actually.  Whenever I see references to the multiprocessing module, I 
picture a giant HERE BE (serialization) DRAGONS sign.  When I saw that some 
things were documented as being functions, I thought that maybe there was 
intended to be a restriction like the these can only be top-level functions so 
they're easy for different executors to locate and serialize.  I didn't 
realize that the intent was arbitrary callables until I carefully re-read the 
document and noticed that the terminology was inconsistent.

 But changing it in the user docs is probably a good idea. I like callables 
 too.

Great.  Still, users will inevitably find the PEP and use it as documentation 
too.

 The execution context of callable code is not made clear.  Implicitly, 
 submit() or map() would run the code in threads or processes as defined by 
 the executor, but that's not spelled out clearly.

Any response to this bit?  Did I miss something in the PEP?

 More relevant to my own interests, the execution context of the callables 
 passed to add_done_callback and remove_done_callback is left almost 
 completely to the imagination.  If I'm reading the sample implementation 
 correctly, 
 http://code.google.com/p/pythonfutures/source/browse/branches/feedback/python3/futures/process.py#241,
  it looks like in the multiprocessing implementation, the done callbacks are 
 invoked in a random local thread.  The fact that they are passed the future 
 itself *sort* of implies that this is the case, but the multiprocessing 
 module plays fast and loose with object identity all over the place, so it 
 would be good to be explicit and say that it's *not* a pickled copy of the 
 future sitting in some arbitrary process (or even on some arbitrary machine).
 
 The callbacks will always be called in a thread other than the main thread in 
 the process that created the executor. Is that a strong enough contract?

Sure.  Really, almost any contract would work, it just needs to be spelled out. 
 It might be nice to know whether the thread invoking the callbacks is a daemon 
thread or not, but I suppose it's not strictly necessary.

 This is really minor, I know, but why does it say NOTE: This method can be 
 used to create adapters from Futures to Twisted Deferreds?  First of all, 
 what's the deal with NOTE; it's the only NOTE in the whole PEP, and it 
 doesn't seem to add anything.  This sentence would read exactly the same if 
 that word were deleted.  Without more clarity on the required execution 
 context of the callbacks, this claim might not actually be true anyway; 
 Deferred callbacks can only be invoked in the main reactor thread in 
 Twisted.  But even if it is perfectly possible, why leave so much of the 
 adapter implementation up to the imagination?  If it's important enough to 
 mention, why not have a reference to such an adapter in the reference 
 Futures implementation, since it *should* be fairly trivial to write?
 
 I'm a bit surprised that this doesn't allow for better interoperability with 
 Deferreds given this discussion:

 discussion snipped

I did not communicate that well.  As implemented, it's quite possible to 
implement a translation layer which turns a Future into a Deferred.  What I 
meant by that comment was, the specification in the PEP was to loose to be sure 
that such a layer would work with arbitrary executors.

For what it's worth, the Deferred translator would look like this, if you want 
to include it in the PEP (untested though, you may want to run it first):

from twisted.internet.defer import Deferred
from twisted.internet.reactor import callFromThread

def future2deferred(future):
d = Deferred()
def invoke_deferred():
try:
result = future.result()
except:
d.errback()
else:
d.callback(result)
def done_callback(same_future):
callFromThread(invoke_deferred)
future.add_done_callback(done_callback)
return d

This does beg the question of what the traceback will look like in that except: 
block though.  I guess the multi-threaded executor will use python3 exception 
chaining so Deferred should be able to show a sane

Re: [Python-Dev] PEP 3148 ready for pronouncement

2010-05-22 Thread Glyph Lefkowitz

On May 22, 2010, at 8:47 PM, Brian Quinlan wrote:

Jesse, the designated pronouncer for this PEP, has decided to keep discussion
open for a few more days.

So fire away!

As you wish!

The PEP should be consistent in its usage of terminology about callables. It
alternately calls them callables, functions, and functions or methods.
It would be nice to clean this up and be consistent about what can be called
where. I personally like callables.

The execution context of callable code is not made clear. Implicitly, submit()
or map() would run the code in threads or processes as defined by the executor,
but that's not spelled out clearly.

More relevant to my own interests, the execution context of the callables
passed to add_done_callback and remove_done_callback is left almost completely
to the imagination. If I'm reading the sample implementation correctly,
http://code.google.com/p/pythonfutures/source/browse/branches/feedback/python3/futures/process.py#241,
it looks like in the multiprocessing implementation, the done callbacks are
invoked in a random local thread. The fact that they are passed the future
itself *sort* of implies that this is the case, but the multiprocessing module
plays fast and loose with object identity all over the place, so it would be
good to be explicit and say that it's *not* a pickled copy of the future
sitting in some arbitrary process (or even on some arbitrary machine).

This is really minor, I know, but why does it say NOTE: This method can be
used to create adapters from Futures to Twisted Deferreds? First of all,
what's the deal with NOTE; it's the only NOTE in the whole PEP, and it
doesn't seem to add anything. This sentence would read exactly the same if
that word were deleted. Without more clarity on the required execution context
of the callbacks, this claim might not actually be true anyway; Deferred
callbacks can only be invoked in the main reactor thread in Twisted. But even
if it is perfectly possible, why leave so much of the adapter implementation up
to the imagination? If it's important enough to mention, why not have a
reference to such an adapter in the reference Futures implementation, since it
*should* be fairly trivial to write?

The fact that add_done_callback is implemented using a set is weird, since it
means you can't add the same callback more than once. The set implementation
also means that the callbacks get called in a semi-random order, potentially
creating even _more_ hard-to-debug order of execution issues than you'd
normally have with futures. And I think that this documentation will be
unclear to a lot of novice developers: many people have trouble with the idea
that a = Foo(); b = Foo(); a.bar_method != b.bar_method, but import
foo_module; foo_module.bar_function == foo_module.bar_function.

It's also weird that you can remove callbacks - what's the use case? Deferreds
have no callback-removal mechanism and nobody has ever complained of the need
for one, as far as I know. (But lots of people do add the same callback
multiple times.)

I suggest having have add_done_callback, implementing it with a list so that
callbacks are always invoked in the order that they're added, and getting rid
of remove_done_callback.

futures._base.Executor isn't exposed publicly, but it needs to be. The PEP
kinda makes it sound like it is (Executor is an abstract class...). Plus, A
third party library wanting to implement an executor of its own shouldn't have
to copy and paste the implementation of Executor.map.

One minor suggestion on the internal future methods bit - something I wish
we'd done with Deferreds was to put 'callback()' and 'addCallbacks()' on
separate objects, so that it was very explicit whether you were on the emitting
side of a Deferred or the consuming side. That seems to be the case with these
internal methods - they are not so much internal as they are for the producer
of the Future (whether a unit test or executor) so you might want to put them
on a different object that it's easy for the thing creating a Future() to get
at but hard for any subsequent application code to fiddle with by accident.
Off the top of my head, I suggest naming it Invoker(). A good way to do this
would be to have an Invoker class which can't be instantiated (raises an
exception from __init__ or somesuch), then a Future.create() method which
returns an Invoker, which itself has a '.future' attribute.

Finally, why isn't this just a module on PyPI? It doesn't seem like there's
any particular benefit to making this a stdlib module and going through the
whole PEP process - except maybe to prompt feedback like this :). Issues like
the ones I'm bringing up could be fixed pretty straightforwardly if it were
just a matter of filing a bug on a small package, but fixing a stdlib module is
a major undertaking.

___
Python-Dev mailing list

1 2 >

1 - 100 of 142 matches

Mail list logo