Nick Raptis wrote:
<div class="moz-text-flowed" style="font-family: -moz-fixed">Dave Angel wrote:
As I said, you'd probably get in trouble if any of the lines had '&' or '<' characters in them. The following function from the standard library can be used to escape the line directly, or of course you could use the function Nick supplied.

xml.sax.saxutils.escape(/data/[, /entities/])

   Escape '&', '<', and '>' in a string of data.

   You can escape other strings of data by passing a dictionary as the
   optional /entities/ parameter. The keys and values must all be
   strings; each key will be replaced with its corresponding value. The
   characters '&', '<' and '>' are always escaped, even if /entities/
   is provided.

Let us know if that doesn't do the trick.

DaveA

Thanks Dave for the info on xml.sax.saxutils.escape
Didn't know about this one.

For the rest:
It is sometimes
This is the source code of the xml.sax.saxutils.escape function:

---------------------------------------
def __dict_replace(s, d):
   """Replace substrings of a string using a dictionary."""
   for key, value in d.items():
       s = s.replace(key, value)
   return s

def escape(data, entities={}):
   """Escape &, <, and > in a string of data.

   You can escape other strings of data by passing a dictionary as
   the optional entities parameter.  The keys and values must all be
   strings; each key will be replaced with its corresponding value.
   """

   # must do ampersand first
   data = data.replace("&", "&amp;")
   data = data.replace(">", "&gt;")
   data = data.replace("<", "&lt;")
   if entities:
       data = __dict_replace(data, entities)
   return data
-----------------------------------------

As you can see, it too uses string.replace to do the job.
However, using a built-in function that works for what you want to do is preferable.
It's tested and might also be optimized to be faster.
It's easy and fun to look into the source though and know exactly what something does.
It's also one of the ways for a begginer (me too) to progress.

From the source code I can see this for example:
*Don' t pass the entity dictionary I proposed earlier to this function:*
entities = {'&' : '&amp;',
          '<' : '&lt;',
          '>' : '&gt;',
          '"' : '&quot;',
          "'" : '&apos;'}
If you pass an entity for '&' into escape(), it will escape it in the already partially escaped string, resulting in chaos.

Think of it, this function not checking for a '&' entity passed to it might worth qualifying as a bug :)

Nick


Yes, duplicating the &amp; entitity would be a bug in the caller's code in this case. (see my posted improvements to the OP code, which removed the variable entities entirely) The question is whether this function's doc should have such a warning, or whether the function should make sure double-substitution does not happen.

The &amp; entity is the only predefined entity in the S3 standard that has this problem. For example, there's no entity that replaces the letter 'a' or the semicolon. And a quote sign is never used within an encoded entity.

I think perhaps an improved version would either ignore a & key in the supplied dictionary, or throw an exception if one is encountered. The question that must always be answered is whether this could break existing code.

There are legitimate reasons for a string to be escaped twice. Think what happens when a website wants to quote some html source code. Or a little less recursively, suppose you have a website teaching xml. The examples posted would need to be double-escaped. However, if someone had tried to do that in a single call to the current function, their code would already be broken because the dictionary doesn't preserve order, so the & substitution might not happen first. Such a user must call the escape function twice, without passing & at all.

DaveA

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to