On 23/01/2006, at 4:59 PM, Deron Meranda wrote:

I like the SSI feature.  It would fill a nice gap between using
plain HTML files and having to go to a more featured template
or engine.  Some things are simple enough that the SSI concept
should be enough, and having Python would be nice.

I do need to give your proposal some more thought before I can
properly comment, but it looks interesting so far.

One thing I think that *should* be very easy to do in an SSI setting
is HTML-escaping.  I shouldn't have to do something like
'from cgi import escpe'.  Perhaps adding another parameter, like

 <!--#python esc="h" eval="print '1<2'"-->

where esc is a built-in escaping filter: h=html, u=url, x=xml
(difference between h and x is how it escapes quote chars).

Using "print" like that can't be done in an "eval", would need to
use "exec". Even then, you have to explicitly direct the "print"
to the filter as not possible with mod_python to change sys.stdout
so that "print" by itself could be used. Thus, with making things
more verbose:

  <!--#python escape="html" exec="print >> filter, '1<2'" -->

Unfortunately though, this would not work. This is because it is
writing direct to the filter and there is no opportunity to
escape the content as it is written. Such automatic escaping could
only be done for "eval", and only if the content is the result.
Something could still write direct to the filter object and bypass
it.

Thus, although it indeed sounds like it may be a useful thing to have,
it would not be practical to implement it such that it was able to
capture all generated content. Thus, requiring the user to explicitly
escape where needed might still be necessary.

In terms of avoiding having to do imports at each point of use,
have got it going now so that all code executing in the page uses
the same local variable space. Thus one could do a whole lot of
imports and variable assignments in one exec at the start and
then use that later on. Thus:

  <!--#python exec="
  from mod_python import apache
  example = apache.import_module('example')
  import cgi, sys
  "-->
  <html>
    <body>
      <p><!-- eval="cgi.escape(sys.version)" --></p>
      <p><!-- exec="example.output_body(filter)" --></p>
    </body>
  </html>

Another question.  How are character sets handled?  If the
output is a Unicode string, how does it get encoded?  Should
it always asume say UTF-8, or can it determine the actual
character encoding for this reponse somehow?

Changes made in the mod_python.publisher layer in mod_python 3.2
might be used as a guide here. Ie., it tries to be smart about
encoding:

        elif isinstance(object,UnicodeType):

# We've got an Unicode string to publish, so we have to encode
            # it to bytes. We try to detect the character encoding
            # from the Content-Type header
            if req._content_type_set:

                charset = re_charset.search(req.content_type)
                if charset:
                    charset = charset.group(1)
                else:
                    # If no character encoding was set, we use UTF8
                    charset = 'UTF8'
                    req.content_type += '; charset=UTF8'

            else:
                # If no character encoding was set, we use UTF8
                charset = 'UTF8'

            result = object.encode(charset)

In mod_python.publisher you can see though that it only worries
about it if the result is a Unicode string. Ie., if anything else
is returned or the request object is written to direct, then it is
up to the user to implement what they want.

If this approach in mod_python.publisher as seen as reasonable and
works, then for 'eval' one could take the same approach. If you are
going to allow that though, then allowing "escape" for "eval" might
also be seen as reasonable. One would just need to document the
caveats.

That said, can you explain more about the differences between HTML
and XML escaping with the quoting. I don't really understand the
differences. Are there particular Python routines that implement
each variant, or is it some option to 'cgi.escape'? Also, which is
the preferred routine for url encoding?

Thanks for your interest.

Graham

Reply via email to