Re: [Python-Dev] 2.5 issues need resolving in a few days

2006-06-11 Thread Nick Coghlan
Fred L. Drake, Jr. wrote:
 On Saturday 10 June 2006 12:34, Fredrik Lundh wrote:
   if all undocumented modules had as much documentation and articles as
   ET, the world would be a lot better documented ;-)
  
   I've posted a text version of the xml.etree.ElementTree PythonDoc here:
 
 Here's a question that we should answer before the beta:
 
 With the introduction of the xmlcore package in Python 2.5, should we 
 document 
 xml.etree or xmlcore.etree?  If someone installs PyXML with Python 2.5, I 
 don't think they're going to get xml.etree, which will be really confusing.  
 We can be sure that xmlcore.etree will be there.
 
 I'd rather not propogate the pain caused xml package insanity any further.

+1 for 'xmlcore.etree'.

I don't use XML very much, and it was thoroughly confusing to find that 
published XML related code didn't work on my machine, even though the stdlib 
claimed to provide an 'xml' module (naturally, the published code needed the 
full version of PyXML, but I didn't know that at the time).

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Segmentation fault in collections.defaultdict

2006-06-11 Thread Nick Coghlan
Kevin Jacobs [EMAIL PROTECTED] wrote:
 Try this at home:
 import collections
 d=collections.defaultdict(int)
 d.iterkeys().next()  # Seg fault
 d.iteritems().next() # Seg fault
 d.itervalues().next() # Fine and dandy

This all worked fine for me in rev 46739 and 46849 (Kubuntu 6.06, gcc 4.0.3).

 Python version:
 Python 2.5a2 (trunk:46822M, Jun 10 2006, 13:14:15)
 [GCC 4.0.2 20050901 (prerelease) (SUSE Linux)] on linux2

Either something got broken and then fixed again between the two revs I tried, 
there's a problem specific to GCC 4.0.2, or there's a problem with whatever 
local modifications you have in your working copy :)

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Segmentation fault in collections.defaultdict

2006-06-11 Thread Georg Brandl
Nick Coghlan wrote:
 Kevin Jacobs [EMAIL PROTECTED] wrote:
 Try this at home:
 import collections
 d=collections.defaultdict(int)
 d.iterkeys().next()  # Seg fault
 d.iteritems().next() # Seg fault
 d.itervalues().next() # Fine and dandy
 
 This all worked fine for me in rev 46739 and 46849 (Kubuntu 6.06, gcc 4.0.3).
 
 Python version:
 Python 2.5a2 (trunk:46822M, Jun 10 2006, 13:14:15)
 [GCC 4.0.2 20050901 (prerelease) (SUSE Linux)] on linux2
 
 Either something got broken and then fixed again between the two revs I 
 tried, 
 there's a problem specific to GCC 4.0.2, or there's a problem with whatever 
 local modifications you have in your working copy :)

Same here. I tried with the same revision as Kevin, and got no segfault
at all (using GCC 4.1.1 on Linux).

Note that GCC 4.0.2 20050901 (prerelease) sound like something that's not
really been thoroughly tested ;)

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] crash in dict on gc collect

2006-06-11 Thread Neal Norwitz
I wonder if this is similar to Kevin's problem?  I couldn't reproduce
his problem though.  This happens with both debug and release builds.
Not sure how to reduce the test case.  pychecker was just iterating
through the byte codes.  It wasn't doing anything particularly
interesting.

./python pychecker/pychecker/checker.py Lib/encodings/cp1140.py

0x004cfa18 in visit_decref (op=0x661180, data=0x0) at gcmodule.c:270
270 if (PyObject_IS_GC(op)) {
(gdb) bt
#0  0x004cfa18 in visit_decref (op=0x661180, data=0x0) at gcmodule.c:270
#1  0x004474ab in dict_traverse (op=0x7cdd90,  visit=0x4cf9e0
visit_decref, arg=0x0) at dictobject.c:1819
#2  0x004cfaf0 in subtract_refs (containers=0x670240) at gcmodule.c:295
#3  0x004d07fd in collect (generation=0) at gcmodule.c:790
#4  0x004d0ad1 in collect_generations () at gcmodule.c:897
#5  0x004d1505 in _PyObject_GC_Malloc (basicsize=56) at gcmodule.c:1332
#6  0x004d1542 in _PyObject_GC_New (tp=0x64f4a0) at gcmodule.c:1342
#7  0x0041d992 in PyInstance_NewRaw (klass=0x2a95dffcc0,
dict=0x800e80) at classobject.c:505
#8  0x0041dab8 in PyInstance_New (klass=0x2a95dffcc0,
arg=0x2a95f5f9e0, kw=0x0) at classobject.c:525
#9  0x0041aa4e in PyObject_Call (func=0x2a95dffcc0,
arg=0x2a95f5f9e0,  kw=0x0) at abstract.c:1802
#10 0x0049ecd2 in do_call (func=0x2a95dffcc0,
pp_stack=0x7fbfffb5b0,  na=3, nk=0) at ceval.c:3785
#11 0x0049e46f in call_function (pp_stack=0x7fbfffb5b0,
oparg=3) at ceval.c:3597
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 issues need resolving in a few days

2006-06-11 Thread Fredrik Lundh
Fred L. Drake, Jr. wrote:

 With the introduction of the xmlcore package in Python 2.5, should we 
 document 
 xml.etree or xmlcore.etree?  If someone installs PyXML with Python 2.5, I 
 don't think they're going to get xml.etree, which will be really confusing.  
 We can be sure that xmlcore.etree will be there.

I think it would be unfortunate if an external, mostly unmaintained 
package could claim absolute ownership of the xml package root.

how about tweaking the xml loader to map xml.foo to _xmlplus.foo 
only if that subpackage really exists ?

/F

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 issues need resolving in a few days

2006-06-11 Thread Simon Percivall
On 11 jun 2006, at 12.09, Fredrik Lundh wrote:
 Fred L. Drake, Jr. wrote:

 With the introduction of the xmlcore package in Python 2.5, should  
 we document
 xml.etree or xmlcore.etree?  If someone installs PyXML with Python  
 2.5, I
 don't think they're going to get xml.etree, which will be really  
 confusing.
 We can be sure that xmlcore.etree will be there.

 I think it would be unfortunate if an external, mostly unmaintained
 package could claim absolute ownership of the xml package root.

 how about tweaking the xml loader to map xml.foo to _xmlplus.foo
 only if that subpackage really exists ?

I'm a bit confused by what the problem is. I though this was all
handled like it should be now.

  import xml.etree
  xml.etree
 module 'xml.etree' from '.../lib/python2.5/xmlcore/etree/ 
__init__.pyc'
  import xml.sax
  xml.sax
 module 'xml.sax' from '.../lib/python2.5/site-packages/_xmlplus/ 
sax/__init__.pyc'

It picks up modules from both places

//Simon
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] UUID module

2006-06-11 Thread Giovanni Bajo
Ka-Ping Yee [EMAIL PROTECTED] wrote:

 Quite a few people have expressed interest in having UUID
 functionality in the standard library, and previously on this
 list some suggested possibly using the uuid.py module i wrote:

 http://zesty.ca/python/uuid.py


Some comments on the code:

 for dir in ['', r'c:\windows\system32', r'c:\winnt\system32']:

Can we get rid of these absolute paths? Something like this should suffice:

 from ctypes import *
 buf = create_string_buffer(4096)
 windll.kernel32.GetSystemDirectoryA(buf, 4096)
17
 buf.value.decode(mbcs)
u'C:\\WINNT\\system32'


  for function in functions:
try:
_node = function()
except:
continue

This also hides typos and whatnot. I guess it's better if each function catches
its own exceptions, and either return None or raise a common exception (like a
class _GetNodeError(RuntimeError)) which is then caught.

Giovanni Bajo

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 issues need resolving in a few days

2006-06-11 Thread Fredrik Lundh
Simon Percivall wrote:

 how about tweaking the xml loader to map xml.foo to _xmlplus.foo
 only if that subpackage really exists ?
 
 I'm a bit confused by what the problem is. I though this was all
 handled like it should be now.

that's how I thought things were done, but then I read Fred's post, and 
looked at the source code, and didn't see this line:

 _xmlplus.__path__.extend(xmlcore.__path__)

or-maybe-someone's-been-using-the-time-machine-ly yrs /F

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] sgmllib Comments

2006-06-11 Thread Sam Ruby
Planet is a feed aggregator written in Python.  It depends heavily on 
SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib, 
and I've submitted a test case and a patch[1] (use or discard the patch, 
it is the test that I care about).

While looking around, a few things surfaced.  For starters, it would 
seem that the version of sgmllib in SVN HEAD will selectively unescape 
certain character references that might appear in an attribute.  I say 
selectively, as:

  * it will unescape  amp;
  * it won't unescape copy;
  * it will unescape  #38;
  * it won't unescape #x26;
  * it will unescape  #146;
  * it won't unescape #8217;

There are a number of issues here.  While not unescaping anything is 
suboptimal, at least the recipient is aware of exactly which characters 
have been unescaped (i.e., none of them).  The proposed solution makes 
it impossible for the recipient to know which characters are unescaped, 
and which are original.  (Note: feeds often contain such abominations as 
amp;copy; which the new code will treat indistinguishably from copy;)

Additionally, there is a unicode issue here - one that is shared by 
handle_charref, but at least that method is overrideable.  If unescaping 
remains, do it for hex character references and for values greather than 
8-bits, i.e., use unichr instead of chr if the value is greater than 127.

- Sam Ruby

[1] http://tinyurl.com/j4a6n
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Switch statement

2006-06-11 Thread Talin
Greg Ewing wrote:
 [EMAIL PROTECTED] wrote:
 
 
switch raw_input(enter a, b or c: ):
case 'a':
print 'yay! an a!'
case 'b':
print 'yay! a b!'
case 'c':
print 'yay! a c!'
else:
print 'hey dummy! I said a, b or c!'
 
 
 Before accepting this, we could do with some debate about the
 syntax. It's not a priori clear that C-style switch/case is
 the best thing to adopt.

Since you don't have the 'fall-through' behavior of C, I would also 
assume that you could associate more than one value with a case, i.e.:

case 'a', 'b', 'c':
   ...

It seems to me that the value of a 'switch' statement is that it is a 
computed jump - that is, instead of having to iteratively test a bunch 
of alternatives, you can directly jump to the code for a specific value.

I can see this being very useful for parser generators and state machine 
code. At the moment, similar things can be done with hash tables of 
functions, but those have a number of limitations, such as the fact that 
they can't access local variables.

I don't have any specific syntax proposals, but I notice that the suite 
that follows the switch statement is not a normal suite, but a 
restricted one, and I am wondering if we could come up with a syntax 
that avoids having a special suite.

Here's an (ugly) example, not meant as a serious proposal:

select (x) when 'a':
   ...
when 'b', 'c':
   ...
else:
   ...

The only real difference between this and an if-else chain is that the 
compiler knows that all of the test expressions are constants and can be 
hashed at compile time.

-- Talin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Aahz
On Sun, Jun 11, 2006, Sam Ruby wrote:

 Planet is a feed aggregator written in Python.  It depends heavily on 
 SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib, 
 and I've submitted a test case and a patch[1] (use or discard the patch, 
 it is the test that I care about).
 
 [1] http://tinyurl.com/j4a6n

When providing links to SF, please use the python.org tinyurl equivalent
to ensure that people can easily see the bug/patch number:

http://www.python.org/sf?id=1504333
-- 
Aahz ([EMAIL PROTECTED])   * http://www.pythoncraft.com/

I saw `cout' being shifted Hello world times to the left and stopped
right there.  --Steve Gonedes
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Import semantics

2006-06-11 Thread Fabio Zadrozny
Python and Jython import semantics differ on how sub-packages should be accessed after importing some module:Jython 2.1 on java1.5.0 (JIT: null)Type copyright, credits or license for more information.
 import xml xml.dommodule xml.dom at 10340434Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on win32Type help, copyright, credits or license for more information.
 import xml xml.domTraceback (most recent call last): File stdin, line 1, in ?AttributeError: 'module' object has no attribute 'dom' from xml.dom
 import pulldom xml.dommodule 'xml.dom' from 'C:\bin\Python24\lib\xml\dom\__init__.pyc'Note that in Jython importing a module makes all subpackages beneath it available, whereas in python, only the tokens available in __init__.py are accessible, but if you do load the module later even if not getting it directly into the namespace, it gets accessible too -- this seems more like something unexpected to me -- I would expect it to be available only if I did some import 
xml.dom at some point.My problem is that in Pydev, in static analysis, I would only get the tokens available for actually imported modules, but that's not true for Jython, and I'm not sure if the current behaviour in Python was expected.
So... which would be the right semantics for this?Thanks,Fabio
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Switch statement

2006-06-11 Thread skip

talin Since you don't have the 'fall-through' behavior of C, I would
talin also assume that you could associate more than one value with a
talin case, i.e.:

talin case 'a', 'b', 'c':
talin...

As Andrew Koenig pointed out, that's not discussed in the PEP.  Given the
various examples though, I would have to assume the above is equivalent to

case ('a', 'b', 'c'):
...

since in all cases the PEP implies a single expression.

talin It seems to me that the value of a 'switch' statement is that it
talin is a computed jump - that is, instead of having to iteratively
talin test a bunch of alternatives, you can directly jump to the code
talin for a specific value.

I agree, but that of course limits the expressions to constants which can be
evaluated at compile-time as I indicated in my previous mail.  Also, as
someone else pointed out, that probably prevents something like

START_TOKEN = ''
END_TOKEN = ''

...

switch expr:
case START_TOKEN:
...
case END_TOKEN:
...

The PEP states that the case clauses must accept constants, but the sample
implementation allows arbitrary expressions.  If we assume that the case
expressions need not be constants, does that force the compiler to evaluate
the case expressions in the order given in the file?  To make my dumb
example from yesterday even dumber:

def f():
switch raw_input(enter b, d or f:):
case incr('a'):
print 'yay! a b!'
case incr('b'):
print 'yay! a d!'
case incr('c'):
print 'yay! an f!'
else:
print 'hey dummy! I said b, d or f!'

_n = 0
def incr(c):
global _n
try:
return chr(ord(c)+1+_n)
finally:
_n += 1
print _n

The cases must be evaluated in the order they are written for the example to
work properly.

The tension between efficient run-time and Python's highly dynamic nature
would seem to prevent the creation of a switch statement that will satisfy
all demands.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Switch statement

2006-06-11 Thread Fredrik Lundh
Talin wrote:

 I don't have any specific syntax proposals, but I notice that the suite 
 that follows the switch statement is not a normal suite, but a 
 restricted one, and I am wondering if we could come up with a syntax 
 that avoids having a special suite.

don't have KR handy, but I'm pretty sure they put switch and case at 
the same level (just like if/else), thus eliminating the need for silly 
special suites.

 The only real difference between this and an if-else chain is that the 
 compiler knows that all of the test expressions are constants and can be 
 hashed at compile time.

the compiler can of course figure that out also for if/elif/else state- 
ments, by inspecting the AST.  the only advantage for switch/case is 
user syntax...

/F

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Switch statement

2006-06-11 Thread Talin
[EMAIL PROTECTED] wrote:
 talin Since you don't have the 'fall-through' behavior of C, I would
 talin also assume that you could associate more than one value with a
 talin case, i.e.:
 
 talin case 'a', 'b', 'c':
 talin...
 
 As Andrew Koenig pointed out, that's not discussed in the PEP.  Given the
 various examples though, I would have to assume the above is equivalent to
 
 case ('a', 'b', 'c'):
 ...

I had recognized that ambiguity as well, but chose not to mention it :)

 since in all cases the PEP implies a single expression.
 
 talin It seems to me that the value of a 'switch' statement is that it
 talin is a computed jump - that is, instead of having to iteratively
 talin test a bunch of alternatives, you can directly jump to the code
 talin for a specific value.
 
 I agree, but that of course limits the expressions to constants which can be
 evaluated at compile-time as I indicated in my previous mail.  Also, as
 someone else pointed out, that probably prevents something like
 
 START_TOKEN = ''
 END_TOKEN = ''
 
 ...
 
 switch expr:
 case START_TOKEN:
 ...
 case END_TOKEN:
 ...

Here's another ugly thought experiment, not meant as a serious proposal; 
it's intent is to stimulate ideas by breaking preconceptions. Suppose we 
take the notion of a computed jump literally:

def myfunc( x ):
   goto dispatcher[ x ]

   section s1:
  ...

   section s2:
  ...

dispatcher=dict('a'=myfunc.s1, 'b'=myfunc.s2)

No, I am *not* proposing that Python add a goto statement. What I am 
really talking about is the idea that you could (somehow) use a 
dictionary as the input to a control construct.

In the above example, rather than allowing arbitrary constant 
expressions as cases, we would require the compiler to generate a set of 
opaque tokens representing various code fragments. These fragments would 
be exactly like inner functions, except that they don't have their own 
scope (and therefore have no parameters either).

Since the jump labels are symbols generated by the compiler, there's no 
ambiguity about when they get evaluated.

The above example also allows these labels to be accessed externally 
from the function by defining attributes on the function object itself 
which correspond to the code fragments.

So in the example, the dictionary which associates specific values with 
executable sections is created once, at runtime, but before the first 
time that myfunc is called.

Of course, this is quite a bit clumsier than a switch statement, which 
is why I say its not a serious proposal.

-- Talin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] subprocess.Popen(.... stdout=IGNORE, ...)

2006-06-11 Thread Martin Blais
In the subprocess module, by default the files handles in the child
are inherited from the parent.  To ignore a child's output, I can use
the stdout or stderr options to send the output to a pipe::

   p = Popen(command, stdout=PIPE, stderr=PIPE)

However, this is sensitive to the buffer deadlock problem, where for
example the buffer for stderr might become full and a deadlock occurs
because the child is blocked on writing to stderr and the parent is
blocked on reading from stdout or waiting for the child to finish.

For example, using this command will cause deadlock::

   call('cat /boot/vmlinuz'.split(), stdout=PIPE, stderr=PIPE)

Popen.communicate() implements a solution using either select() or
multiple threads (under Windows) to read from the pipes, and returns
the strings as a result.  It works out like this::

   p = Popen(command, stdout=PIPE, stderr=PIPE)
   output, errors = p.communicate()
   if p.returncode != 0:
…

Now, as a user of the subprocess module, sometimes I just want to
call some child process and simply ignore its output, and to do so I
am forced to use communicate() as above and wastefully capture and
ignore the strings.  This is actually quite a common use case.  Just
run something, and check the return code.  Right now, in order to do
this without polluting the parent's output, you cannot use the call()
convenience (or is there another way?).

A workaround that works under UNIX is to do this::

   FNULL = open('/dev/null', 'w')
   returncode = call(command, stdout=FNULL, stderr=FNULL)


Some feedback requested, I'd like to know what you think:

1. Would it not be nice to add a IGNORE constant to subprocess.py
   that would do this automatically?, i.e. ::

 returncode = call(command, stdout=IGNORE, stderr=IGNORE)

   Rather than capture and accumulate the output, it would find an
   appropriate OS-specific way to ignore the output (the /dev/null file
   above works well under UNIX, how would you do this under Windows?
   I'm sure we can find something.)

2. call() should be modified to not be sensitive to the deadlock
   problem, since its interface provides no way to return the
   contents of the output.  The IGNORE value provides a possible
   solution for this.

3. With the /dev/null file solution, the following code actually
   works without deadlock, because stderr is never blocked on writing
   to /dev/null::

 p = Popen(command, stdout=PIPE, stderr=IGNORE)
 text = p.stdout.read()
 retcode = p.wait()

   Any idea how this idiom could be supported using a more portable
   solution (i.e. how would I make this idiom under Windows, is there
   some equivalent to /dev/null)?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] UUID module

2006-06-11 Thread Ka-Ping Yee
Thomas Heller wrote:
 I don't know if this is the uuidgen you're talking about, but
 on linux there is libuuid:

Thanks!

Okay, that's in there now.  Have a look at http://zesty.ca/python/uuid.py .

Phillip J. Eby wrote:
 By the way, I'd love to see a uuid.uuid() constructor that simply calls the
 platform-specific default UUID constructor (CoCreateGuid or uuidgen(2)),

I've added code to make uuid1() use uuid_generate_time() if available
and uuid4() use uuid_generate_random() if available.  These functions
are provided on Mac OS X (in libc) and on Linux (in libuuid).  Does
that work for you?

I'm using the Windows UUID generation calls (UuidCreate and
UuidCreateSequential in rpcrt4) only to get the hardware address,
not to make UUIDs, because they yield results that aren't compliant
with RFC 4122.  Even worse, they actually have the variant bits set
to say that they are RFC 4122, but they can have an illegal version
number.  If there are better alternatives on Windows, i'm happy to
use them.


-- ?!ng
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Should hex() yield 'L' suffix for long numbers?

2006-06-11 Thread Ka-Ping Yee
I did this earlier:

 hex(9)
'0x9184e729fffL'

and found it a little jarring, because i feel there's been a general
trend toward getting rid of the 'L' suffix in Python.

Literal long integers don't need an L anymore; they're automatically
made into longs if the number is too big.  And while the repr() of
a long retains the L on the end, the str() of a long does not, and
i rather like that.

So i kind of expected that hex() would not include the L either.
I see its main job as just giving me the hex digits (in fact, for
Python 3000 i'd prefer even to drop the '0x' as well), and the L
seems superfluous and distracting.

What do you think?  Is Python 2.5 a reasonable time to drop this L?


-- ?!ng
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] a note in random.shuffle.__doc__ ...

2006-06-11 Thread Greg Ewing
Terry Jones wrote:

 Suppose you have a RNG with a cycle length of 5. There's nothing to stop an
 algorithm from taking multiple already returned values and combining them
 in some (deterministic) way to generate  5 outcomes.

No, it's not. As long as the RNG output is the only input to
the algorithm, and the algorithm is deterministic, it is
not possible get more than N different outcomes. It doesn't
matter what the algorithm does with the input.

 If you
 expanded what you meant by internal states to include the state of the
 algorithm (as well as the state of the RNG), then I'd be more inclined to
 agree.

If the algorithm can start out with more than one initial
state, then the RNG is not the only input.

 Worse, if you have multiple threads / processes using the same RNG, the
 individual threads could exhibit _much_ more random behavior

Then you haven't got a deterministic algorithm.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-11 Thread Greg Ewing
BJörn Lindqvist wrote:

 I don't know how difficult it is to get rid of the
 implicit return None or even if it is doable, but if it is, it
 should, IMHO, be done.

It's been proposed before, and the conclusion was that
it would cause more problems than it would solve.

(Essentially it would require returning some object
that raised an exception when anything at all was
done to it, but such an object would cause debuggers
and other introspective code to choke.)

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Switch statement

2006-06-11 Thread Greg Ewing
Talin wrote:

 Since you don't have the 'fall-through' behavior of C, I would also 
 assume that you could associate more than one value with a case, i.e.:
 
 case 'a', 'b', 'c':
...

Multiple values could be written

   case 'a':
   case 'b':
   case 'c':
 ...

without conflicting with the no-fallthrough semantics, since
a do-nothing case can be written as

   case 'd':
 pass

 I don't have any specific syntax proposals, but I notice that the suite 
 that follows the switch statement is not a normal suite, but a 
 restricted one,

I don't see that as a problem. And all the proposed syntaxes
I've ever seen for putting the cases at the same level as
the switch look ugly to me.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Switch statement

2006-06-11 Thread Greg Ewing
[EMAIL PROTECTED] wrote:

 I agree, but that of course limits the expressions to constants which can be
 evaluated at compile-time as I indicated in my previous mail.

A way out of this would be to define the semantics so that
the expression values are allowed to be cached, and the
order of evaluation and testing is undefined. So the first
time through, the values could all be put in a dict, to
be looked up thereafter.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] a note in random.shuffle.__doc__ ...

2006-06-11 Thread Terry Jones
 Greg == Greg Ewing [EMAIL PROTECTED] writes:

Greg Terry Jones wrote:
 Suppose you have a RNG with a cycle length of 5. There's nothing to stop an
 algorithm from taking multiple already returned values and combining them
 in some (deterministic) way to generate  5 outcomes.

Greg No, it's not. As long as the RNG output is the only input to
Greg the algorithm, and the algorithm is deterministic, it is
Greg not possible get more than N different outcomes. It doesn't
Greg matter what the algorithm does with the input.

Greg If the algorithm can start out with more than one initial
Greg state, then the RNG is not the only input.

The code below uses a RNG with period 5, is deterministic, and has one
initial state. It produces 20 different outcomes.

It's just doing a simplistic version of what a lagged RNG generator does,
but the lagged part is in the algorithm not in the rng. That's why I said
if you included the state of the algorithm in what you meant by state I'd
be more inclined to agree.

Terry



n = map(float, range(1, 17, 3))
i = 0

def rng():
global i
i += 1
if i == 5: i = 0
return n[i]

if __name__ == '__main__':
seen = {}
history = [rng()]
o = 0
for lag in range(1, 5):
for x in range(5):
o += 1
new = rng()
outcome = new / history[-lag]
if outcome in seen: print DUP!
seen[outcome] = True
print outcome %d = %f % (o, outcome)
history.append(new)


# Outputs
outcome 1 = 1.75
outcome 2 = 1.428571
outcome 3 = 1.30
outcome 4 = 0.076923
outcome 5 = 4.00
outcome 6 = 7.00
outcome 7 = 2.50
outcome 8 = 1.857143
outcome 9 = 0.10
outcome 10 = 0.307692
outcome 11 = 0.538462
outcome 12 = 10.00
outcome 13 = 3.25
outcome 14 = 0.142857
outcome 15 = 0.40
outcome 16 = 0.70
outcome 17 = 0.769231
outcome 18 = 13.00
outcome 19 = 0.25
outcome 20 = 0.571429
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Terry Reedy

Fred L. Drake, Jr. [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 On Sunday 11 June 2006 16:26, Sam Ruby wrote:
  Planet is a feed aggregator written in Python.  It depends heavily on
  SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib,
  and I've submitted a test case and a patch[1] (use or discard the 
  patch,
  it is the test that I care about).
...
  and which are original.  (Note: feeds often contain such abominations 
  as
  amp;copy; which the new code will treat indistinguishably from copy;)

 It really sounds like sgmllib is the wrong foundation for this.
...
 Have you looked at HTMLParser as an alternate to sgmllib?
 It has better support for XHTML constructs.

Have you (the OP), checked how related Python projects, such as Mark 
Pilgrim's feed parser,
http://www.feedparser.org/
handle the same sort of input (I have only looked at docs and tests, not 
code).

tjr



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Import semantics

2006-06-11 Thread Terry Reedy

Fabio Zadrozny [EMAIL PROTECTED] wrote in message
Jython 2.1 on java1.5.0 (JIT: null)
Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on 
win32

Jython 2.1 intends to match Python 2.1, I believe.
Python 2.2, which I still have loaded, matches Python 2.4 in the behavior 
reported.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] subprocess.Popen(.... stdout=IGNORE, ...)

2006-06-11 Thread Terry Reedy

Martin Blais [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
   Any idea how this idiom could be supported using a more portable
  solution (i.e. how would I make this idiom under Windows, is there
   some equivalent to /dev/null)?

On a DOS/Windows command line,  'NUL:' or 'nul:'




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should hex() yield 'L' suffix for long numbers?

2006-06-11 Thread Tim Peters
[Ka-Ping Yee]
 I did this earlier:

  hex(9)
 '0x9184e729fffL'

 and found it a little jarring, because i feel there's been a general
 trend toward getting rid of the 'L' suffix in Python.

 Literal long integers don't need an L anymore; they're automatically
 made into longs if the number is too big.  And while the repr() of
 a long retains the L on the end, the str() of a long does not, and
 i rather like that.

 So i kind of expected that hex() would not include the L either.
 I see its main job as just giving me the hex digits (in fact, for
 Python 3000 i'd prefer even to drop the '0x' as well), and the L
 seems superfluous and distracting.

 What do you think?  Is Python 2.5 a reasonable time to drop this L?

As I read pep 237, that should have happened in Python 2.3 or 2.4.
This specific case is kinda muddy there.  Regardless, the only part
that was left for Python 3 was phase C, and this is phase C in its
entirety:

 C. The trailing 'L' is dropped from repr(), and made illegal on
   input.  (If possible, the 'long' type completely disappears.)

It's possible, though, that hex() and oct() were implicitly considered
to be variants of repr() for purposes of phase C.  How much are we
willing to pay Guido to Pronounce?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] a note in random.shuffle.__doc__ ...

2006-06-11 Thread Tim Peters
[Terry Jones]
 The code below uses a RNG with period 5, is deterministic, and has one
 initial state. It produces 20 different outcomes.

Well, I'd call the sequence of 20 numbers it produces one outcome.
From that view, there are at most 5 outcomes it can produce (at most 5
distinct 20-number sequences).  In much the same way, there are at
most P distinct infinite sequences this can produce, if the PRNG used
by random.random() has period P:

def belch():
import random, math
start = random.random()
i = 0
while True:
i += 1
yield math.fmod(i * start, 1.0)

The trick is to define outcome in such a way that the original claim
is true :-)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Sam Ruby
Fred L. Drake, Jr. wrote:
 On Sunday 11 June 2006 16:26, Sam Ruby wrote:
   Planet is a feed aggregator written in Python.  It depends heavily on
   SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib,
   and I've submitted a test case and a patch[1] (use or discard the patch,
   it is the test that I care about).
 
 And it's a nice aggregator to use, indeed!
 
   While looking around, a few things surfaced.  For starters, it would
   seem that the version of sgmllib in SVN HEAD will selectively unescape
   certain character references that might appear in an attribute.  I say
   selectively, as:
  
 * it will unescape  amp;
 * it won't unescape copy;
 * it will unescape  #38;
 * it won't unescape #x26;
 * it will unescape  #146;
 * it won't unescape #8217;
 
 And just why would you use sgmllib to handle RSS or ATOM feeds?  Neither is 
 defined in terms of SGML.  The sgmllib documentation also notes that it isn't 
 really a fully general SGML parser (it isn't), but that it exists primarily 
 as a foundation for htmllib.

The feed itself is read first with SAX (then with a fallback using 
sgmllib if the feed is not well formed, but that's beside the point). 
Then the embedded HTML portions are then processed with subclasses of 
sgmllib.

   There are a number of issues here.  While not unescaping anything is
   suboptimal, at least the recipient is aware of exactly which characters
   have been unescaped (i.e., none of them).  The proposed solution makes
   it impossible for the recipient to know which characters are unescaped,
   and which are original.  (Note: feeds often contain such abominations as
   amp;copy; which the new code will treat indistinguishably from copy;)
 
 My suspicion is that the right thing to do at the sgmllib level is to 
 categorize the markup and call a method depending on what the entity 
 reference is, and let that handle whatever it is.  For SGML, that means we 
 have things like name; (entity references), #123; (character references), 
 and that's it.  #x123; isn't legal SGML under any circumstance; 
 the #xnumber; syntax was introduced with XML.

... but it effectively is valid HTML.  And as you point out below 
sgmllib's raison d’être is to support htmllib.

   Additionally, there is a unicode issue here - one that is shared by
   handle_charref, but at least that method is overrideable.  If unescaping
   remains, do it for hex character references and for values greather than
   8-bits, i.e., use unichr instead of chr if the value is greater than 127.
 
 For SGML, it's worse than that, since the document character set is defined 
 in 
 the SGML declaration, which is a far hairier beast than an XML 
 declaration.  :-)

understood

 It really sounds like sgmllib is the wrong foundation for this.  While the 
 module has some questionable behaviors, none of them are signifcant in the 
 context it's intended context (support for htmllib).  Now, I understand that 
 RSS has historical issues, with HTML-as-practiced getting embedded as payload 
 data with various flavors of escaping applied, and I'm not an expert in the 
 details of that.  Have you looked at HTMLParser as an alternate to sgmllib?  
 It has better support for XHTML constructs.

HTMLParser is less forgiving, and generally less suitable for consuming 
HTML as practiced.

- Sam Ruby

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Sam Ruby
Terry Reedy wrote:
 Fred L. Drake, Jr. [EMAIL PROTECTED] wrote in message 
 news:[EMAIL PROTECTED]
 On Sunday 11 June 2006 16:26, Sam Ruby wrote:
 Planet is a feed aggregator written in Python.  It depends heavily on
 SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib,
 and I've submitted a test case and a patch[1] (use or discard the 
 patch,
 it is the test that I care about).
 ...
 and which are original.  (Note: feeds often contain such abominations 
 as
 amp;copy; which the new code will treat indistinguishably from copy;)
 
 It really sounds like sgmllib is the wrong foundation for this.
 ...
 Have you looked at HTMLParser as an alternate to sgmllib?
 It has better support for XHTML constructs.
 
 Have you (the OP), checked how related Python projects, such as Mark 
 Pilgrim's feed parser,
 http://www.feedparser.org/
 handle the same sort of input (I have only looked at docs and tests, not 
 code).

Just to be clear: Planet uses Mark's feed parser, which uses SGMLlib.

I'm a committer on that project:

http://sourceforge.net/project/memberlist.php?group_id=112328

I was investigating a bug in sgmllib which affected the feed parser (and 
therefore Planet), and noticed that there were changes in the SVN head 
of Python which broke three feed parser unit tests.

It is my belief that these changes will break other existing users of 
sgmllib.

- Sam Ruby
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Fred L. Drake, Jr.
On Monday 12 June 2006 00:05, Sam Ruby wrote:
  Just to be clear: Planet uses Mark's feed parser, which uses SGMLlib.

Cool.

  I was investigating a bug in sgmllib which affected the feed parser (and
  therefore Planet), and noticed that there were changes in the SVN head
  of Python which broke three feed parser unit tests.
 
  It is my belief that these changes will break other existing users of
  sgmllib.

This is good to know; thanks for pointing it out.

If you can summarize the specific changes to sgmllib that cause problems for 
the feed parser, and identify the tests there that rely on the old behavior, 
I'll be glad to look at the problems.  I expect to have some time in the next 
few evenings, so I should be able to look at these soon.

Is the SourceForge CVS the definitive development source for the feed parser?


  -Fred

-- 
Fred L. Drake, Jr.   fdrake at acm.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Martin v. Löwis
Sam Ruby wrote:
 Planet is a feed aggregator written in Python.  It depends heavily on 
 SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib, 
 and I've submitted a test case and a patch[1] (use or discard the patch, 
 it is the test that I care about).

I think (but am not sure) you are referring to patch #1462498 here,
which fixes bugs 1452246 and 1087808.

   * it will unescape  amp;
   * it won't unescape copy;

That must be because you have amp in your entitydefs, but not copy.

   * it will unescape  #38;
   * it won't unescape #x26;

That's because it doesn't recognize hex character references.
That's systematic, though: it doesn't just ignore them in attribute
values, but also in content.

   * it will unescape  #146;
   * it won't unescape #8217;

That's because the value is larger than 256, so chr() fails.

 There are a number of issues here.  While not unescaping anything is 
 suboptimal, at least the recipient is aware of exactly which characters 
 have been unescaped (i.e., none of them).  The proposed solution makes 
 it impossible for the recipient to know which characters are unescaped, 
 and which are original.  (Note: feeds often contain such abominations as 
 amp;copy; which the new code will treat indistinguishably from copy;)

The recipient should then add copy; to entitydefs; sgmllib will
unescape copy, so the recipient can know not to unescape that.

Alternatively, the recipient could provide an empty entitydefs.

 Additionally, there is a unicode issue here - one that is shared by 
 handle_charref, but at least that method is overrideable.  If unescaping 
 remains, do it for hex character references and for values greather than 
 8-bits, i.e., use unichr instead of chr if the value is greater than 127.

Alternatively, a callback function could be provided for character
references. Unfortunately, the existing callback is unsuitable,
as it is supposed to do the full processing; this callback should
return the replacement text. Generally assuming Unicode would be
wrong, though.

Would you like to contribute a patch?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5 issues need resolving in a few days

2006-06-11 Thread Martin v. Löwis
Neal Norwitz wrote:
 The most important outstanding issue is the xmlplus/xmlcore issue.
 It's not going to get fixed unless someone works on it.  There's only
 a few days left before beta 1.  Can someone please address this?

From my point of view, I shall consider them resolved/irrelevant:
I'm going to step down as a PyXML maintainer, so I don't have to
worry anymore about how to maintain PyXML. If PyXML then gets
unmaintained, the problem goes away, otherwise, the new maintainer
will have to find a solution.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Sam Ruby
Martin v. Löwis wrote:
 
 Alternatively, a callback function could be provided for character
 references. Unfortunately, the existing callback is unsuitable,
 as it is supposed to do the full processing; this callback should
 return the replacement text. Generally assuming Unicode would be
 wrong, though.
 
 Would you like to contribute a patch?

If we can agree on the behavior, I would be glad to write up a patch.

It seems to me that the simplest way to proceed would be for the code 
that attempts to resolve character references (both named and numeric) 
in attributes to be isolated in a single method.  Subclasses that desire 
different behavior (including the existing Python 2.4 and prior 
behaviour) could simply override this method.

- Sam Ruby
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com