The solution is: Know where you have byte strings and where you have
unicode objects. If you have a form, parameters will be byte strings
encoded with the encoding of the html page. The database stores byte
strings and has an encoding as well. As a general rule you should use
unicode objects in your program and know the boundaries where data comes
in (forms) or gets serialized (database). Encode/decode at those
boundaries and you are safe.
Just for interest, here is how I do this encoding/decoding for CGI input. There are two encodings defined, the user's preferred encoding (as sent by the browser in the HTTP_ACCEPT_CHARSET header) and the application encoding (as used by the database). I use codecs.EncodedFile (with the preferred and application encodings reversed) to encode the output before it's sent to the browser. I hope this helps others in understanding how encoding/decoding works with web applications, it took me a while to figure it out!

Cheers

Peter



import cgi

def cgiescape(s, encodeAmp=False):
    if isinstance(s, basestring):
	if encodeAmp:
	    s = s.replace("&", "&")
	s = s.replace("<", "&lt;")
	s = s.replace(">", "&gt;")
        s = s.replace('"', "&quot;")
	s = s.replace("'", "&#39;")
    return s


class Request:
    def __init__(self, input, environment, applicationEncoding='ascii'):
	preferredEncoding = determinePreferredEncoding(environment, applicationEncoding)
	
        fieldStorage = cgi.FieldStorage(input, environ=environment)

	self.__properties = {}
        for key in fieldStorage.keys():
            field = fieldStorage[key]
            if isinstance(field, list):
		for item in field:
		    unicodeValue = unicode(item.value, preferredEncoding)
		    appValue = unicodeValue.encode(applicationEncoding, unicodeValue)
		    item.value = appValue
                self.__properties[key] = field
            elif field.filename:
                self.__properties[key] = field
            else:
		unicodeValue = unicode(field.value, preferredEncoding)
		appValue = unicodeValue.encode(applicationEncoding, unicodeValue)
                self.__properties[key] = appValue
        self.environ = environment
    
    def __determinePreferredEncoding(self, environment, applicationEncoding):
	try:
	    encodingHeader = environment['HTTP_ACCEPT_CHARSET']
	except KeyError:
	    preferredEncoding = applicationEncoding
	else:
	    try:
		elements = encodingHeader.split(';')
		encodings = elements[0].split(',')
		preferredEncoding = encodings[0]
	    except:
		# If we get dodgy input here it may be a hack attempt, best to ignore
		preferredEncoding = applicationEncoding

	return preferredEncoding

    def get(self, attr, escapeAmp=True):
        try:
            value = self.__properties[attr]
	    if escapeAmp:
		value = cgiescape(value, escapeAmp)
	    return value
        except:
            return None
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
sqlobject-discuss mailing list
sqlobject-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss

Reply via email to