date:20060611

[Python-Dev] crash in dict on gc collect

2006-06-11 Thread Neal Norwitz

I wonder if this is similar to Kevin's problem?  I couldn't reproduce
his problem though.  This happens with both debug and release builds.
Not sure how to reduce the test case.  pychecker was just iterating
through the byte codes.  It wasn't doing anything particularly
interesting.

./python pychecker/pychecker/checker.py Lib/encodings/cp1140.py

0x004cfa18 in visit_decref (op=0x661180, data=0x0) at gcmodule.c:270
270 if (PyObject_IS_GC(op)) {
(gdb) bt
#0  0x004cfa18 in visit_decref (op=0x661180, data=0x0) at gcmodule.c:270
#1  0x004474ab in dict_traverse (op=0x7cdd90,  visit=0x4cf9e0
, arg=0x0) at dictobject.c:1819
#2  0x004cfaf0 in subtract_refs (containers=0x670240) at gcmodule.c:295
#3  0x004d07fd in collect (generation=0) at gcmodule.c:790
#4  0x004d0ad1 in collect_generations () at gcmodule.c:897
#5  0x004d1505 in _PyObject_GC_Malloc (basicsize=56) at gcmodule.c:1332
#6  0x004d1542 in _PyObject_GC_New (tp=0x64f4a0) at gcmodule.c:1342
#7  0x0041d992 in PyInstance_NewRaw (klass=0x2a95dffcc0,
dict=0x800e80) at classobject.c:505
#8  0x0041dab8 in PyInstance_New (klass=0x2a95dffcc0,
arg=0x2a95f5f9e0, kw=0x0) at classobject.c:525
#9  0x0041aa4e in PyObject_Call (func=0x2a95dffcc0,
arg=0x2a95f5f9e0,  kw=0x0) at abstract.c:1802
#10 0x0049ecd2 in do_call (func=0x2a95dffcc0,
pp_stack=0x7fbfffb5b0,  na=3, nk=0) at ceval.c:3785
#11 0x0049e46f in call_function (pp_stack=0x7fbfffb5b0,
oparg=3) at ceval.c:3597
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Add pure python PNG writer module to stdlib?

2006-06-11 Thread Bob Ippolito

On Jun 10, 2006, at 10:52 PM, Johann C. Rocholl wrote:

>>> Does anybody think it could go into stdlib before the feature  
>>> freeze for
>> 2.5?
>>
>> Nope.  To get added to the stdlib there needs to be support from the
>> community that the module is useful and best-of-breed.  Try  
>> posting on
>> c.l.py and see if people pick it up and like it.  No way that is  
>> going to
>> happen before b1.  But there is always 2.6 .
>
> That's what I thought. My remote hope was that there would be
> immediate concensus on python-dev about both the 'useful' and
> 'best-of-breed' parts. Anybody else with a +1? ;-)
>
> Seriously, it's totally fine with me if the module doesn't make it
> into 2.5, or even if it never makes it into stdlib. I'm just offering
> it with some enthusiasm.

The best way to do this would be to make it available as its own  
package. Give it a setup.py, stick it on CheeseShop, etc.

For performance and memory usage reasons it would probably make sense  
to take an iterator that returns a scanline at a time. The current  
implementation does a lot more allocations than it needs to (full  
image, then one str per scanline). It also asserts type is str, when  
a buffer or mmap object would work perfectly well otherwise. If  
reading from a file or something you could skip the full allocation  
and a lot of memcpy by reading a scanline at a time.

I'd also like to see RGBA support as well. Often the reason for  
choosing png over other lossless formats is its support for alpha.  
For your use case it's irrelevant, but there are many use cases that  
need the alpha channel.

But to reiterate, further discussion of this really belongs on c.l.py  
for now...

-bob

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 2.5 issues need resolving in a few days

2006-06-11 Thread Fredrik Lundh

Fred L. Drake, Jr. wrote:

> With the introduction of the xmlcore package in Python 2.5, should we 
> document 
> xml.etree or xmlcore.etree?  If someone installs PyXML with Python 2.5, I 
> don't think they're going to get xml.etree, which will be really confusing.  
> We can be sure that xmlcore.etree will be there.

I think it would be unfortunate if an external, mostly unmaintained 
package could claim absolute ownership of the xml package root.

how about tweaking the xml loader to map "xml.foo" to "_xmlplus.foo" 
only if that subpackage really exists ?

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 2.5 issues need resolving in a few days

2006-06-11 Thread Simon Percivall

On 11 jun 2006, at 12.09, Fredrik Lundh wrote:
> Fred L. Drake, Jr. wrote:
>
>> With the introduction of the xmlcore package in Python 2.5, should  
>> we document
>> xml.etree or xmlcore.etree?  If someone installs PyXML with Python  
>> 2.5, I
>> don't think they're going to get xml.etree, which will be really  
>> confusing.
>> We can be sure that xmlcore.etree will be there.
>
> I think it would be unfortunate if an external, mostly unmaintained
> package could claim absolute ownership of the xml package root.
>
> how about tweaking the xml loader to map "xml.foo" to "_xmlplus.foo"
> only if that subpackage really exists ?

I'm a bit confused by what the problem is. I though this was all
handled like it should be now.

 >>> import xml.etree
 >>> xml.etree
 
 >>> import xml.sax
 >>> xml.sax
 

It picks up modules from both places

//Simon
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] UUID module

2006-06-11 Thread Giovanni Bajo

Ka-Ping Yee <[EMAIL PROTECTED]> wrote:

> Quite a few people have expressed interest in having UUID
> functionality in the standard library, and previously on this
> list some suggested possibly using the uuid.py module i wrote:
>
> http://zesty.ca/python/uuid.py

Some comments on the code:

> for dir in ['', r'c:\windows\system32', r'c:\winnt\system32']:

Can we get rid of these absolute paths? Something like this should suffice:

>>> from ctypes import *
>>> buf = create_string_buffer(4096)
>>> windll.kernel32.GetSystemDirectoryA(buf, 4096)
17
>>> buf.value.decode("mbcs")
u'C:\\WINNT\\system32'

>  for function in functions:
>try:
>_node = function()
>except:
>continue

This also hides typos and whatnot. I guess it's better if each function catches
its own exceptions, and either return None or raise a common exception (like a
class _GetNodeError(RuntimeError)) which is then caught.

Giovanni Bajo

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 2.5 issues need resolving in a few days

2006-06-11 Thread Fredrik Lundh

Simon Percivall wrote:

>> how about tweaking the xml loader to map "xml.foo" to "_xmlplus.foo"
>> only if that subpackage really exists ?
> 
> I'm a bit confused by what the problem is. I though this was all
> handled like it should be now.

that's how I thought things were done, but then I read Fred's post, and 
looked at the source code, and didn't see this line:

 _xmlplus.__path__.extend(xmlcore.__path__)

or-maybe-someone's-been-using-the-time-machine-ly yrs /F

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] sgmllib Comments

2006-06-11 Thread Sam Ruby

Planet is a feed aggregator written in Python.  It depends heavily on 
SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib, 
and I've submitted a test case and a patch[1] (use or discard the patch, 
it is the test that I care about).

While looking around, a few things surfaced.  For starters, it would 
seem that the version of sgmllib in SVN HEAD will selectively unescape 
certain character references that might appear in an attribute.  I say 
selectively, as:

  * it will unescape  &
  * it won't unescape ©
  * it will unescape  &
  * it won't unescape &
  * it will unescape  ’
  * it won't unescape ’

There are a number of issues here.  While not unescaping anything is 
suboptimal, at least the recipient is aware of exactly which characters 
have been unescaped (i.e., none of them).  The proposed solution makes 
it impossible for the recipient to know which characters are unescaped, 
and which are original.  (Note: feeds often contain such abominations as 
© which the new code will treat indistinguishably from ©)

Additionally, there is a unicode issue here - one that is shared by 
handle_charref, but at least that method is overrideable.  If unescaping 
remains, do it for hex character references and for values greather than 
8-bits, i.e., use unichr instead of chr if the value is greater than 127.

- Sam Ruby

[1] http://tinyurl.com/j4a6n
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Switch statement

2006-06-11 Thread Talin

Greg Ewing wrote:
> [EMAIL PROTECTED] wrote:
> 
> 
>>switch raw_input("enter a, b or c: "):
>>case 'a':
>>print 'yay! an a!'
>>case 'b':
>>print 'yay! a b!'
>>case 'c':
>>print 'yay! a c!'
>>else:
>>print 'hey dummy! I said a, b or c!'
> 
> 
> Before accepting this, we could do with some debate about the
> syntax. It's not a priori clear that C-style switch/case is
> the best thing to adopt.

Since you don't have the 'fall-through' behavior of C, I would also 
assume that you could associate more than one value with a case, i.e.:

case 'a', 'b', 'c':
   ...

It seems to me that the value of a 'switch' statement is that it is a 
computed jump - that is, instead of having to iteratively test a bunch 
of alternatives, you can directly jump to the code for a specific value.

I can see this being very useful for parser generators and state machine 
code. At the moment, similar things can be done with hash tables of 
functions, but those have a number of limitations, such as the fact that 
they can't access local variables.

I don't have any specific syntax proposals, but I notice that the suite 
that follows the switch statement is not a normal suite, but a 
restricted one, and I am wondering if we could come up with a syntax 
that avoids having a special suite.

Here's an (ugly) example, not meant as a serious proposal:

select (x) when 'a':
   ...
when 'b', 'c':
   ...
else:
   ...

The only real difference between this and an if-else chain is that the 
compiler knows that all of the test expressions are constants and can be 
hashed at compile time.

-- Talin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Aahz

On Sun, Jun 11, 2006, Sam Ruby wrote:
>
> Planet is a feed aggregator written in Python.  It depends heavily on 
> SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib, 
> and I've submitted a test case and a patch[1] (use or discard the patch, 
> it is the test that I care about).
> 
> [1] http://tinyurl.com/j4a6n

When providing links to SF, please use the python.org tinyurl equivalent
to ensure that people can easily see the bug/patch number:

http://www.python.org/sf?id=1504333
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"I saw `cout' being shifted "Hello world" times to the left and stopped
right there."  --Steve Gonedes
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Import semantics

2006-06-11 Thread Fabio Zadrozny

Python and Jython import semantics differ on how sub-packages should be accessed after importing some module:Jython 2.1 on java1.5.0 (JIT: null)Type "copyright", "credits" or "license" for more information.
>>> import xml>>> xml.domPython 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on win32Type "help", "copyright", "credits" or "license" for more information.
>>> import xml>>> xml.domTraceback (most recent call last):  File "", line 1, in ?AttributeError: 'module' object has no attribute 'dom'>>> from xml.dom
 import pulldom>>> xml.domNote that in Jython importing a module makes all subpackages beneath it available, whereas in python, only the tokens available in __init__.py are accessible, but if you do load the module later even if not getting it directly into the namespace, it gets accessible too -- this seems more like something unexpected to me -- I would expect it to be available only if I did some "import 
xml.dom" at some point.My problem is that in Pydev, in static analysis, I would only get the tokens available for actually imported modules, but that's not true for Jython, and I'm not sure if the current behaviour in Python was expected.
So... which would be the right semantics for this?Thanks,Fabio
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Switch statement

2006-06-11 Thread skip


talin> Since you don't have the 'fall-through' behavior of C, I would
talin> also assume that you could associate more than one value with a
talin> case, i.e.:

talin> case 'a', 'b', 'c':
talin>...

As Andrew Koenig pointed out, that's not discussed in the PEP.  Given the
various examples though, I would have to assume the above is equivalent to

case ('a', 'b', 'c'):
...

since in all cases the PEP implies a single expression.

talin> It seems to me that the value of a 'switch' statement is that it
talin> is a computed jump - that is, instead of having to iteratively
talin> test a bunch of alternatives, you can directly jump to the code
talin> for a specific value.

I agree, but that of course limits the expressions to constants which can be
evaluated at compile-time as I indicated in my previous mail.  Also, as
someone else pointed out, that probably prevents something like

START_TOKEN = '<'
END_TOKEN = '>'

...

switch expr:
case START_TOKEN:
...
case END_TOKEN:
...

The PEP states that the case clauses must accept constants, but the sample
implementation allows arbitrary expressions.  If we assume that the case
expressions need not be constants, does that force the compiler to evaluate
the case expressions in the order given in the file?  To make my dumb
example from yesterday even dumber:

def f():
switch raw_input("enter b, d or f:"):
case incr('a'):
print 'yay! a b!'
case incr('b'):
print 'yay! a d!'
case incr('c'):
print 'yay! an f!'
else:
print 'hey dummy! I said b, d or f!'

_n = 0
def incr(c):
global _n
try:
return chr(ord(c)+1+_n)
finally:
_n += 1
print _n

The cases must be evaluated in the order they are written for the example to
work properly.

The tension between efficient run-time and Python's highly dynamic nature
would seem to prevent the creation of a switch statement that will satisfy
all demands.

Skip
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Switch statement

2006-06-11 Thread Fredrik Lundh

Talin wrote:

> I don't have any specific syntax proposals, but I notice that the suite 
> that follows the switch statement is not a normal suite, but a 
> restricted one, and I am wondering if we could come up with a syntax 
> that avoids having a special suite.

don't have K&R handy, but I'm pretty sure they put switch and case at 
the same level (just like if/else), thus eliminating the need for silly 
special suites.

> The only real difference between this and an if-else chain is that the 
> compiler knows that all of the test expressions are constants and can be 
> hashed at compile time.

the compiler can of course figure that out also for if/elif/else state- 
ments, by inspecting the AST.  the only advantage for switch/case is 
user syntax...

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Switch statement

2006-06-11 Thread Talin

[EMAIL PROTECTED] wrote:
> talin> Since you don't have the 'fall-through' behavior of C, I would
> talin> also assume that you could associate more than one value with a
> talin> case, i.e.:
> 
> talin> case 'a', 'b', 'c':
> talin>...
> 
> As Andrew Koenig pointed out, that's not discussed in the PEP.  Given the
> various examples though, I would have to assume the above is equivalent to
> 
> case ('a', 'b', 'c'):
> ...

I had recognized that ambiguity as well, but chose not to mention it :)

> since in all cases the PEP implies a single expression.
> 
> talin> It seems to me that the value of a 'switch' statement is that it
> talin> is a computed jump - that is, instead of having to iteratively
> talin> test a bunch of alternatives, you can directly jump to the code
> talin> for a specific value.
> 
> I agree, but that of course limits the expressions to constants which can be
> evaluated at compile-time as I indicated in my previous mail.  Also, as
> someone else pointed out, that probably prevents something like
> 
> START_TOKEN = '<'
> END_TOKEN = '>'
> 
> ...
> 
> switch expr:
> case START_TOKEN:
> ...
> case END_TOKEN:
> ...

Here's another ugly thought experiment, not meant as a serious proposal; 
it's intent is to stimulate ideas by breaking preconceptions. Suppose we 
take the notion of a computed jump literally:

def myfunc( x ):
   goto dispatcher[ x ]

   section s1:
  ...

   section s2:
  ...

dispatcher=dict('a'=myfunc.s1, 'b'=myfunc.s2)

No, I am *not* proposing that Python add a goto statement. What I am 
really talking about is the idea that you could (somehow) use a 
dictionary as the input to a control construct.

In the above example, rather than allowing arbitrary constant 
expressions as cases, we would require the compiler to generate a set of 
opaque tokens representing various code fragments. These fragments would 
be exactly like inner functions, except that they don't have their own 
scope (and therefore have no parameters either).

Since the jump labels are symbols generated by the compiler, there's no 
ambiguity about when they get evaluated.

The above example also allows these labels to be accessed externally 
from the function by defining attributes on the function object itself 
which correspond to the code fragments.

So in the example, the dictionary which associates specific values with 
executable sections is created once, at runtime, but before the first 
time that myfunc is called.

Of course, this is quite a bit clumsier than a switch statement, which 
is why I say its not a serious proposal.

-- Talin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] subprocess.Popen(.... stdout=IGNORE, ...)

2006-06-11 Thread Martin Blais

In the subprocess module, by default the files handles in the child
are inherited from the parent.  To ignore a child's output, I can use
the stdout or stderr options to send the output to a pipe::

   p = Popen(command, stdout=PIPE, stderr=PIPE)

However, this is sensitive to the buffer deadlock problem, where for
example the buffer for stderr might become full and a deadlock occurs
because the child is blocked on writing to stderr and the parent is
blocked on reading from stdout or waiting for the child to finish.

For example, using this command will cause deadlock::

   call('cat /boot/vmlinuz'.split(), stdout=PIPE, stderr=PIPE)

Popen.communicate() implements a solution using either select() or
multiple threads (under Windows) to read from the pipes, and returns
the strings as a result.  It works out like this::

   p = Popen(command, stdout=PIPE, stderr=PIPE)
   output, errors = p.communicate()
   if p.returncode != 0:
…

Now, as a user of the subprocess module, sometimes I just want to
call some child process and simply ignore its output, and to do so I
am forced to use communicate() as above and wastefully capture and
ignore the strings.  This is actually quite a common use case.  "Just
run something, and check the return code".  Right now, in order to do
this without polluting the parent's output, you cannot use the call()
convenience (or is there another way?).

A workaround that works under UNIX is to do this::

   FNULL = open('/dev/null', 'w')
   returncode = call(command, stdout=FNULL, stderr=FNULL)


Some feedback requested, I'd like to know what you think:

1. Would it not be nice to add a IGNORE constant to subprocess.py
   that would do this automatically?, i.e. ::

 returncode = call(command, stdout=IGNORE, stderr=IGNORE)

   Rather than capture and accumulate the output, it would find an
   appropriate OS-specific way to ignore the output (the /dev/null file
   above works well under UNIX, how would you do this under Windows?
   I'm sure we can find something.)

2. call() should be modified to not be sensitive to the deadlock
   problem, since its interface provides no way to return the
   contents of the output.  The IGNORE value provides a possible
   solution for this.

3. With the /dev/null file solution, the following code actually
   works without deadlock, because stderr is never blocked on writing
   to /dev/null::

 p = Popen(command, stdout=PIPE, stderr=IGNORE)
 text = p.stdout.read()
 retcode = p.wait()

   Any idea how this idiom could be supported using a more portable
   solution (i.e. how would I make this idiom under Windows, is there
   some equivalent to /dev/null)?
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] UUID module

2006-06-11 Thread Ka-Ping Yee

Thomas Heller wrote:
> I don't know if this is the uuidgen you're talking about, but
> on linux there is libuuid:

Thanks!

Okay, that's in there now.  Have a look at http://zesty.ca/python/uuid.py .

Phillip J. Eby wrote:
> By the way, I'd love to see a uuid.uuid() constructor that simply calls the
> platform-specific default UUID constructor (CoCreateGuid or uuidgen(2)),

I've added code to make uuid1() use uuid_generate_time() if available
and uuid4() use uuid_generate_random() if available.  These functions
are provided on Mac OS X (in libc) and on Linux (in libuuid).  Does
that work for you?

I'm using the Windows UUID generation calls (UuidCreate and
UuidCreateSequential in rpcrt4) only to get the hardware address,
not to make UUIDs, because they yield results that aren't compliant
with RFC 4122.  Even worse, they actually have the variant bits set
to say that they are RFC 4122, but they can have an illegal version
number.  If there are better alternatives on Windows, i'm happy to
use them.

-- ?!ng
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Should hex() yield 'L' suffix for long numbers?

2006-06-11 Thread Ka-Ping Yee

I did this earlier:

>>> hex(9)
'0x9184e729fffL'

and found it a little jarring, because i feel there's been a general
trend toward getting rid of the 'L' suffix in Python.

Literal long integers don't need an L anymore; they're automatically
made into longs if the number is too big.  And while the repr() of
a long retains the L on the end, the str() of a long does not, and
i rather like that.

So i kind of expected that hex() would not include the L either.
I see its main job as just giving me the hex digits (in fact, for
Python 3000 i'd prefer even to drop the '0x' as well), and the L
seems superfluous and distracting.

What do you think?  Is Python 2.5 a reasonable time to drop this L?


-- ?!ng
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] a note in random.shuffle.doc ...

2006-06-11 Thread Greg Ewing

Terry Jones wrote:

> Suppose you have a RNG with a cycle length of 5. There's nothing to stop an
> algorithm from taking multiple already returned values and combining them
> in some (deterministic) way to generate > 5 outcomes.

No, it's not. As long as the RNG output is the only input to
the algorithm, and the algorithm is deterministic, it is
not possible get more than N different outcomes. It doesn't
matter what the algorithm does with the input.

> If you
> expanded what you meant by "internal states" to include the state of the
> algorithm (as well as the state of the RNG), then I'd be more inclined to
> agree.

If the algorithm can start out with more than one initial
state, then the RNG is not the only input.

> Worse, if you have multiple threads / processes using the same RNG, the
> individual threads could exhibit _much_ more random behavior

Then you haven't got a deterministic algorithm.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-11 Thread Greg Ewing

BJörn Lindqvist wrote:

> I don't know how difficult it is to get rid of the
> implicit "return None" or even if it is doable, but if it is, it
> should, IMHO, be done.

It's been proposed before, and the conclusion was that
it would cause more problems than it would solve.

(Essentially it would require returning some object
that raised an exception when anything at all was
done to it, but such an object would cause debuggers
and other introspective code to choke.)

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Fred L. Drake, Jr.

On Sunday 11 June 2006 16:26, Sam Ruby wrote:
 > Planet is a feed aggregator written in Python.  It depends heavily on
 > SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib,
 > and I've submitted a test case and a patch[1] (use or discard the patch,
 > it is the test that I care about).

And it's a nice aggregator to use, indeed!

 > While looking around, a few things surfaced.  For starters, it would
 > seem that the version of sgmllib in SVN HEAD will selectively unescape
 > certain character references that might appear in an attribute.  I say
 > selectively, as:
 >
 >   * it will unescape  &
 >   * it won't unescape ©
 >   * it will unescape  &
 >   * it won't unescape &
 >   * it will unescape  ’
 >   * it won't unescape ’

And just why would you use sgmllib to handle RSS or ATOM feeds?  Neither is 
defined in terms of SGML.  The sgmllib documentation also notes that it isn't 
really a fully general SGML parser (it isn't), but that it exists primarily 
as a foundation for htmllib.

 > There are a number of issues here.  While not unescaping anything is
 > suboptimal, at least the recipient is aware of exactly which characters
 > have been unescaped (i.e., none of them).  The proposed solution makes
 > it impossible for the recipient to know which characters are unescaped,
 > and which are original.  (Note: feeds often contain such abominations as
 > © which the new code will treat indistinguishably from ©)

My suspicion is that the "right" thing to do at the sgmllib level is to 
categorize the markup and call a method depending on what the entity 
reference is, and let that handle whatever it is.  For SGML, that means we 
have things like &name; (entity references), { (character references), 
and that's it.  ģ isn't legal SGML under any circumstance; 
the "&#x;" syntax was introduced with XML.

 > Additionally, there is a unicode issue here - one that is shared by
 > handle_charref, but at least that method is overrideable.  If unescaping
 > remains, do it for hex character references and for values greather than
 > 8-bits, i.e., use unichr instead of chr if the value is greater than 127.

For SGML, it's worse than that, since the document character set is defined in 
the SGML declaration, which is a far hairier beast than an XML 
declaration.  :-)

It really sounds like sgmllib is the wrong foundation for this.  While the 
module has some questionable behaviors, none of them are signifcant in the 
context it's intended context (support for htmllib).  Now, I understand that 
RSS has historical issues, with HTML-as-practiced getting embedded as payload 
data with various flavors of escaping applied, and I'm not an expert in the 
details of that.  Have you looked at HTMLParser as an alternate to sgmllib?  
It has better support for XHTML constructs.

  -Fred

-- 
Fred L. Drake, Jr.   
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Switch statement

2006-06-11 Thread Greg Ewing

Talin wrote:

> Since you don't have the 'fall-through' behavior of C, I would also 
> assume that you could associate more than one value with a case, i.e.:
> 
> case 'a', 'b', 'c':
>...

Multiple values could be written

   case 'a':
   case 'b':
   case 'c':
 ...

without conflicting with the no-fallthrough semantics, since
a do-nothing case can be written as

   case 'd':
 pass

> I don't have any specific syntax proposals, but I notice that the suite 
> that follows the switch statement is not a normal suite, but a 
> restricted one,

I don't see that as a problem. And all the proposed syntaxes
I've ever seen for putting the cases at the same level as
the switch look ugly to me.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Switch statement

2006-06-11 Thread Greg Ewing

[EMAIL PROTECTED] wrote:

> I agree, but that of course limits the expressions to constants which can be
> evaluated at compile-time as I indicated in my previous mail.

A way out of this would be to define the semantics so that
the expression values are allowed to be cached, and the
order of evaluation and testing is undefined. So the first
time through, the values could all be put in a dict, to
be looked up thereafter.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] a note in random.shuffle.doc ...

2006-06-11 Thread Terry Jones

> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:

Greg> Terry Jones wrote:
>> Suppose you have a RNG with a cycle length of 5. There's nothing to stop an
>> algorithm from taking multiple already returned values and combining them
>> in some (deterministic) way to generate > 5 outcomes.

Greg> No, it's not. As long as the RNG output is the only input to
Greg> the algorithm, and the algorithm is deterministic, it is
Greg> not possible get more than N different outcomes. It doesn't
Greg> matter what the algorithm does with the input.

Greg> If the algorithm can start out with more than one initial
Greg> state, then the RNG is not the only input.

The code below uses a RNG with period 5, is deterministic, and has one
initial state. It produces 20 different outcomes.

It's just doing a simplistic version of what a lagged RNG generator does,
but the lagged part is in the "algorithm" not in the rng. That's why I said
if you included the state of the algorithm in what you meant by "state" I'd
be more inclined to agree.

Terry

n = map(float, range(1, 17, 3))
i = 0

def rng():
global i
i += 1
if i == 5: i = 0
return n[i]

if __name__ == '__main__':
seen = {}
history = [rng()]
o = 0
for lag in range(1, 5):
for x in range(5):
o += 1
new = rng()
outcome = new / history[-lag]
if outcome in seen: print "DUP!"
seen[outcome] = True
print "outcome %d = %f" % (o, outcome)
history.append(new)

# Outputs
outcome 1 = 1.75
outcome 2 = 1.428571
outcome 3 = 1.30
outcome 4 = 0.076923
outcome 5 = 4.00
outcome 6 = 7.00
outcome 7 = 2.50
outcome 8 = 1.857143
outcome 9 = 0.10
outcome 10 = 0.307692
outcome 11 = 0.538462
outcome 12 = 10.00
outcome 13 = 3.25
outcome 14 = 0.142857
outcome 15 = 0.40
outcome 16 = 0.70
outcome 17 = 0.769231
outcome 18 = 13.00
outcome 19 = 0.25
outcome 20 = 0.571429
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Terry Reedy


"Fred L. Drake, Jr." <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> On Sunday 11 June 2006 16:26, Sam Ruby wrote:
> > Planet is a feed aggregator written in Python.  It depends heavily on
> > SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib,
> > and I've submitted a test case and a patch[1] (use or discard the 
> > patch,
> > it is the test that I care about).
...
> > and which are original.  (Note: feeds often contain such abominations 
> > as
> > © which the new code will treat indistinguishably from ©)

> It really sounds like sgmllib is the wrong foundation for this.
...
> Have you looked at HTMLParser as an alternate to sgmllib?
> It has better support for XHTML constructs.

Have you (the OP), checked how related Python projects, such as Mark 
Pilgrim's feed parser,
http://www.feedparser.org/
handle the same sort of input (I have only looked at docs and tests, not 
code).

tjr



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Import semantics

2006-06-11 Thread Terry Reedy


"Fabio Zadrozny" <[EMAIL PROTECTED]> wrote in message
>Jython 2.1 on java1.5.0 (JIT: null)
>Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on 
>win32

Jython 2.1 intends to match Python 2.1, I believe.
Python 2.2, which I still have loaded, matches Python 2.4 in the behavior 
reported.



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] subprocess.Popen(.... stdout=IGNORE, ...)

2006-06-11 Thread Terry Reedy


"Martin Blais" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
>   Any idea how this idiom could be supported using a more portable
>  solution (i.e. how would I make this idiom under Windows, is there
>   some equivalent to /dev/null)?

On a DOS/Windows command line,  '>NUL:' or '>nul:'




___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Should hex() yield 'L' suffix for long numbers?

2006-06-11 Thread Tim Peters

[Ka-Ping Yee]
> I did this earlier:
>
> >>> hex(9)
> '0x9184e729fffL'
>
> and found it a little jarring, because i feel there's been a general
> trend toward getting rid of the 'L' suffix in Python.
>
> Literal long integers don't need an L anymore; they're automatically
> made into longs if the number is too big.  And while the repr() of
> a long retains the L on the end, the str() of a long does not, and
> i rather like that.
>
> So i kind of expected that hex() would not include the L either.
> I see its main job as just giving me the hex digits (in fact, for
> Python 3000 i'd prefer even to drop the '0x' as well), and the L
> seems superfluous and distracting.
>
> What do you think?  Is Python 2.5 a reasonable time to drop this L?

As I read pep 237, that should have happened in Python 2.3 or 2.4.
This specific case is kinda muddy there.  Regardless, the only part
that was left for Python 3 was "phase C", and this is phase C in its
entirety:

 C. The trailing 'L' is dropped from repr(), and made illegal on
   input.  (If possible, the 'long' type completely disappears.)

It's possible, though, that hex() and oct() were implicitly considered
to be variants of repr() for purposes of phase C.  How much are we
willing to pay Guido to Pronounce?
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] a note in random.shuffle.doc ...

2006-06-11 Thread Tim Peters

[Terry Jones]
> The code below uses a RNG with period 5, is deterministic, and has one
> initial state. It produces 20 different outcomes.

Well, I'd call the sequence of 20 numbers it produces one outcome.
>From that view, there are at most 5 outcomes it can produce (at most 5
distinct 20-number sequences).  In much the same way, there are at
most P distinct infinite sequences this can produce, if the PRNG used
by random.random() has period P:

def belch():
import random, math
start = random.random()
i = 0
while True:
i += 1
yield math.fmod(i * start, 1.0)

The trick is to define "outcome" in such a way that the original claim
is true :-)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Sam Ruby

Fred L. Drake, Jr. wrote:
> On Sunday 11 June 2006 16:26, Sam Ruby wrote:
>  > Planet is a feed aggregator written in Python.  It depends heavily on
>  > SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib,
>  > and I've submitted a test case and a patch[1] (use or discard the patch,
>  > it is the test that I care about).
> 
> And it's a nice aggregator to use, indeed!
> 
>  > While looking around, a few things surfaced.  For starters, it would
>  > seem that the version of sgmllib in SVN HEAD will selectively unescape
>  > certain character references that might appear in an attribute.  I say
>  > selectively, as:
>  >
>  >   * it will unescape  &
>  >   * it won't unescape ©
>  >   * it will unescape  &
>  >   * it won't unescape &
>  >   * it will unescape  ’
>  >   * it won't unescape ’
> 
> And just why would you use sgmllib to handle RSS or ATOM feeds?  Neither is 
> defined in terms of SGML.  The sgmllib documentation also notes that it isn't 
> really a fully general SGML parser (it isn't), but that it exists primarily 
> as a foundation for htmllib.

The feed itself is read first with SAX (then with a fallback using 
sgmllib if the feed is not well formed, but that's beside the point). 
Then the embedded HTML portions are then processed with subclasses of 
sgmllib.

>  > There are a number of issues here.  While not unescaping anything is
>  > suboptimal, at least the recipient is aware of exactly which characters
>  > have been unescaped (i.e., none of them).  The proposed solution makes
>  > it impossible for the recipient to know which characters are unescaped,
>  > and which are original.  (Note: feeds often contain such abominations as
>  > © which the new code will treat indistinguishably from ©)
> 
> My suspicion is that the "right" thing to do at the sgmllib level is to 
> categorize the markup and call a method depending on what the entity 
> reference is, and let that handle whatever it is.  For SGML, that means we 
> have things like &name; (entity references), { (character references), 
> and that's it.  ģ isn't legal SGML under any circumstance; 
> the "&#x;" syntax was introduced with XML.

... but it effectively is valid HTML.  And as you point out below 
sgmllib's raison d’être is to support htmllib.

>  > Additionally, there is a unicode issue here - one that is shared by
>  > handle_charref, but at least that method is overrideable.  If unescaping
>  > remains, do it for hex character references and for values greather than
>  > 8-bits, i.e., use unichr instead of chr if the value is greater than 127.
> 
> For SGML, it's worse than that, since the document character set is defined 
> in 
> the SGML declaration, which is a far hairier beast than an XML 
> declaration.  :-)

understood

> It really sounds like sgmllib is the wrong foundation for this.  While the 
> module has some questionable behaviors, none of them are signifcant in the 
> context it's intended context (support for htmllib).  Now, I understand that 
> RSS has historical issues, with HTML-as-practiced getting embedded as payload 
> data with various flavors of escaping applied, and I'm not an expert in the 
> details of that.  Have you looked at HTMLParser as an alternate to sgmllib?  
> It has better support for XHTML constructs.

HTMLParser is less forgiving, and generally less suitable for consuming 
HTML as practiced.

- Sam Ruby

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Sam Ruby

Terry Reedy wrote:
> "Fred L. Drake, Jr." <[EMAIL PROTECTED]> wrote in message 
> news:[EMAIL PROTECTED]
>> On Sunday 11 June 2006 16:26, Sam Ruby wrote:
>>> Planet is a feed aggregator written in Python.  It depends heavily on
>>> SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib,
>>> and I've submitted a test case and a patch[1] (use or discard the 
>>> patch,
>>> it is the test that I care about).
> ...
>>> and which are original.  (Note: feeds often contain such abominations 
>>> as
>>> © which the new code will treat indistinguishably from ©)
> 
>> It really sounds like sgmllib is the wrong foundation for this.
> ...
>> Have you looked at HTMLParser as an alternate to sgmllib?
>> It has better support for XHTML constructs.
> 
> Have you (the OP), checked how related Python projects, such as Mark 
> Pilgrim's feed parser,
> http://www.feedparser.org/
> handle the same sort of input (I have only looked at docs and tests, not 
> code).

Just to be clear: Planet uses Mark's feed parser, which uses SGMLlib.

I'm a committer on that project:

http://sourceforge.net/project/memberlist.php?group_id=112328

I was investigating a bug in sgmllib which affected the feed parser (and 
therefore Planet), and noticed that there were changes in the SVN head 
of Python which broke three feed parser unit tests.

It is my belief that these changes will break other existing users of 
sgmllib.

- Sam Ruby
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Martin v. Löwis

Aahz wrote:
> When providing links to SF, please use the python.org tinyurl equivalent
> to ensure that people can easily see the bug/patch number:
> 
> http://www.python.org/sf?id=1504333

Although I usually use the path-style form:

http://www.python.org/sf/1504333

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Fred L. Drake, Jr.

On Monday 12 June 2006 00:05, Sam Ruby wrote:
 > Just to be clear: Planet uses Mark's feed parser, which uses SGMLlib.

Cool.

 > I was investigating a bug in sgmllib which affected the feed parser (and
 > therefore Planet), and noticed that there were changes in the SVN head
 > of Python which broke three feed parser unit tests.
 >
 > It is my belief that these changes will break other existing users of
 > sgmllib.

This is good to know; thanks for pointing it out.

If you can summarize the specific changes to sgmllib that cause problems for 
the feed parser, and identify the tests there that rely on the old behavior, 
I'll be glad to look at the problems.  I expect to have some time in the next 
few evenings, so I should be able to look at these soon.

Is the SourceForge CVS the definitive development source for the feed parser?

  -Fred

-- 
Fred L. Drake, Jr.   
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Martin v. Löwis

Sam Ruby wrote:
> Planet is a feed aggregator written in Python.  It depends heavily on 
> SGMLLib.  A recent bug report turned out to be a deficiency in sgmllib, 
> and I've submitted a test case and a patch[1] (use or discard the patch, 
> it is the test that I care about).

I think (but am not sure) you are referring to patch #1462498 here,
which fixes bugs 1452246 and 1087808.

>   * it will unescape  &
>   * it won't unescape ©

That must be because you have amp in your entitydefs, but not copy.

>   * it will unescape  &
>   * it won't unescape &

That's because it doesn't recognize hex character references.
That's systematic, though: it doesn't just ignore them in attribute
values, but also in content.

>   * it will unescape  ’
>   * it won't unescape ’

That's because the value is larger than 256, so chr() fails.

> There are a number of issues here.  While not unescaping anything is 
> suboptimal, at least the recipient is aware of exactly which characters 
> have been unescaped (i.e., none of them).  The proposed solution makes 
> it impossible for the recipient to know which characters are unescaped, 
> and which are original.  (Note: feeds often contain such abominations as 
> © which the new code will treat indistinguishably from ©)

The recipient should then add © to entitydefs; sgmllib will
unescape copy, so the recipient can know not to unescape that.

Alternatively, the recipient could provide an empty entitydefs.

> Additionally, there is a unicode issue here - one that is shared by 
> handle_charref, but at least that method is overrideable.  If unescaping 
> remains, do it for hex character references and for values greather than 
> 8-bits, i.e., use unichr instead of chr if the value is greater than 127.

Alternatively, a callback function could be provided for character
references. Unfortunately, the existing callback is unsuitable,
as it is supposed to do the full processing; this callback should
return the replacement text. Generally assuming Unicode would be
wrong, though.

Would you like to contribute a patch?

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 2.5 issues need resolving in a few days

2006-06-11 Thread Martin v. Löwis

Neal Norwitz wrote:
> The most important outstanding issue is the xmlplus/xmlcore issue.
> It's not going to get fixed unless someone works on it.  There's only
> a few days left before beta 1.  Can someone please address this?

>From my point of view, I shall consider them resolved/irrelevant:
I'm going to step down as a PyXML maintainer, so I don't have to
worry anymore about how to maintain PyXML. If PyXML then gets
unmaintained, the problem goes away, otherwise, the new maintainer
will have to find a solution.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Sam Ruby

Fred L. Drake, Jr. wrote:
> On Monday 12 June 2006 00:05, Sam Ruby wrote:
>  > Just to be clear: Planet uses Mark's feed parser, which uses SGMLlib.
> 
> Cool.
> 
>  > I was investigating a bug in sgmllib which affected the feed parser (and
>  > therefore Planet), and noticed that there were changes in the SVN head
>  > of Python which broke three feed parser unit tests.
>  >
>  > It is my belief that these changes will break other existing users of
>  > sgmllib.
> 
> This is good to know; thanks for pointing it out.
> 
> If you can summarize the specific changes to sgmllib that cause problems for 
> the feed parser, and identify the tests there that rely on the old behavior, 
> I'll be glad to look at the problems.  I expect to have some time in the next 
> few evenings, so I should be able to look at these soon.
> 
> Is the SourceForge CVS the definitive development source for the feed parser?

Yes: but if you check out the CVS HEAD, you won't see any failures as 
I've committed changes that mitigate the problems I've found.

However, if you get the latest release instead, you will see that feeds 
that contain < & or > in attribute values will get these 
converted to <, &, and > characters instead.  In some cases, this can 
cause problems.  Particularly if the output is reparsed by sgmllib.

Additionally, entity references in the range of  to ÿ will 
cause the released Feed Parser to die with a UnicodeDecodeError.

My workarounds are to re-escape < and > characters, and to escape bare 
ampersands - beyond that I can't really tell for sure which ampersands 
need to be re-escaped, and which ones I should leave as is.

And I first try decoding attributes in the original declared encoding 
and then fall back to iso-8859-1.  If a single attribute value contains 
both non-ASCII utf-8 characters and a numeric character reference above 
€ then this will produce incorrect results.

I also have committed a workaround to the incorrect parsing of 
attributes with quoted markup that I originally reported.

- Sam Ruby
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Sam Ruby

Martin v. Löwis wrote:
> 
> Alternatively, a callback function could be provided for character
> references. Unfortunately, the existing callback is unsuitable,
> as it is supposed to do the full processing; this callback should
> return the replacement text. Generally assuming Unicode would be
> wrong, though.
> 
> Would you like to contribute a patch?

If we can agree on the behavior, I would be glad to write up a patch.

It seems to me that the simplest way to proceed would be for the code 
that attempts to resolve character references (both named and numeric) 
in attributes to be isolated in a single method.  Subclasses that desire 
different behavior (including the existing Python 2.4 and prior 
behaviour) could simply override this method.

- Sam Ruby
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] sgmllib Comments

2006-06-11 Thread Martin v. Löwis

Sam Ruby wrote:
> If we can agree on the behavior, I would be glad to write up a patch.
> 
> It seems to me that the simplest way to proceed would be for the code
> that attempts to resolve character references (both named and numeric)
> in attributes to be isolated in a single method.  Subclasses that desire
> different behavior (including the existing Python 2.4 and prior
> behaviour) could simply override this method.

In SGML, this is problematic: The named things are not character
references, they are entity references, and it isn't necessarily
the case that they expand to a character. For example, &author;
might expand to "Martin v. Löwis", and &logo; might refer to a
bitmap image which is unparsed.

That said, providing a overridable replacement function sounds
like the right approach. To keep with tradition, I would still
distinguish between character references and entity references,
i.e. providing two overridable functions instead. Returning
None could mean that no replacement is available.

As for default implementations, I think they should do what
currently happens: entity references are replaced according to
entitydefs, character references are replaced to bytes if
they are smaller than 256.

Contrary to what others said, it appears that SGML *does*
support hexadecimal character references, provided that
the SGML declaraction contains the HCRO definition (which,
for HTML and XML, is defined as HCRO "&#x"). So it seems
safe to process hex character references by default (although
it isn't safe to assume Unicode, IMO).

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] crash in dict on gc collect

Re: [Python-Dev] Add pure python PNG writer module to stdlib?

Re: [Python-Dev] 2.5 issues need resolving in a few days

Re: [Python-Dev] 2.5 issues need resolving in a few days

Re: [Python-Dev] UUID module

Re: [Python-Dev] 2.5 issues need resolving in a few days

[Python-Dev] sgmllib Comments

Re: [Python-Dev] Switch statement

Re: [Python-Dev] sgmllib Comments

[Python-Dev] Import semantics

Re: [Python-Dev] Switch statement

Re: [Python-Dev] Switch statement

Re: [Python-Dev] Switch statement

[Python-Dev] subprocess.Popen(.... stdout=IGNORE, ...)

Re: [Python-Dev] UUID module

[Python-Dev] Should hex() yield 'L' suffix for long numbers?

Re: [Python-Dev] a note in random.shuffle.doc ...

Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

Re: [Python-Dev] sgmllib Comments

Re: [Python-Dev] Switch statement

Re: [Python-Dev] Switch statement

Re: [Python-Dev] a note in random.shuffle.doc ...

Re: [Python-Dev] sgmllib Comments

Re: [Python-Dev] Import semantics

Re: [Python-Dev] subprocess.Popen(.... stdout=IGNORE, ...)

Re: [Python-Dev] Should hex() yield 'L' suffix for long numbers?

Re: [Python-Dev] a note in random.shuffle.doc ...

Re: [Python-Dev] sgmllib Comments

Re: [Python-Dev] sgmllib Comments

Re: [Python-Dev] sgmllib Comments

Re: [Python-Dev] sgmllib Comments

Re: [Python-Dev] sgmllib Comments

Re: [Python-Dev] 2.5 issues need resolving in a few days

Re: [Python-Dev] sgmllib Comments

Re: [Python-Dev] sgmllib Comments

Re: [Python-Dev] sgmllib Comments

36 matches

Site Navigation

Mail list logo

Footer information