The solution is: Know where you have byte strings and where you have
unicode objects. If you have a form, parameters will be byte strings
encoded with the encoding of the html page. The database stores byte
strings and has an encoding as well. As a general rule you should use
unicode objects in your program and know the boundaries where data comes
in (forms) or gets serialized (database). Encode/decode at those
boundaries and you are safe.
Just for interest, here is how I do this encoding/decoding for CGI
input. There are two encodings defined, the user's preferred encoding
(as sent by the browser in the HTTP_ACCEPT_CHARSET header) and the
application encoding (as used by the database). I use
codecs.EncodedFile (with the preferred and application encodings
reversed) to encode the output before it's sent to the browser. I hope
this helps others in understanding how encoding/decoding works with web
applications, it took me a while to figure it out!
Cheers
Peter
import cgi
def cgiescape(s, encodeAmp=False):
if isinstance(s, basestring):
if encodeAmp:
s = s.replace("&", "&")
s = s.replace("<", "<")
s = s.replace(">", ">")
s = s.replace('"', """)
s = s.replace("'", "'")
return s
class Request:
def __init__(self, input, environment, applicationEncoding='ascii'):
preferredEncoding = determinePreferredEncoding(environment, applicationEncoding)
fieldStorage = cgi.FieldStorage(input, environ=environment)
self.__properties = {}
for key in fieldStorage.keys():
field = fieldStorage[key]
if isinstance(field, list):
for item in field:
unicodeValue = unicode(item.value, preferredEncoding)
appValue = unicodeValue.encode(applicationEncoding, unicodeValue)
item.value = appValue
self.__properties[key] = field
elif field.filename:
self.__properties[key] = field
else:
unicodeValue = unicode(field.value, preferredEncoding)
appValue = unicodeValue.encode(applicationEncoding, unicodeValue)
self.__properties[key] = appValue
self.environ = environment
def __determinePreferredEncoding(self, environment, applicationEncoding):
try:
encodingHeader = environment['HTTP_ACCEPT_CHARSET']
except KeyError:
preferredEncoding = applicationEncoding
else:
try:
elements = encodingHeader.split(';')
encodings = elements[0].split(',')
preferredEncoding = encodings[0]
except:
# If we get dodgy input here it may be a hack attempt, best to ignore
preferredEncoding = applicationEncoding
return preferredEncoding
def get(self, attr, escapeAmp=True):
try:
value = self.__properties[attr]
if escapeAmp:
value = cgiescape(value, escapeAmp)
return value
except:
return None
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
sqlobject-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss