Hans Müller wrote: > Hi python experts, > > in the moment I'm struggling with an annoying problem in conjunction with > mysql. > > I'm fetching rows from a database, which the mysql drive returns as a list > of tuples. > > The default coding of the database is utf-8. > > Unfortunately in the database there are rows with different codings and > there is a blob column. > > In the app. I search for double entries in the database with this code. > > hash = {} > cursor.execute("select * from table") > rows = cursor.fetchall() > for row in rows: > key = "|".join([str(x) for x in row]) <- here the problem arises > if key in hash: > print "found double entry" > > This code works as expected with python 2.5.2 > With 2.5.1 it shows this error: > > > key = "|".join(str(x) for x in row) > UnicodeEncodeError: 'ascii' codec can't encode character u'\u017e' in > position 3: ordinal not in range(128) > > When I replace the str() call by unicode(), I get this error when a blob > column is being processed: > > key = "|".join(unicode(x) for x in row) > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 119: > ordinal not in range(128) > > > Please help, how can I convert ANY column data to a string which is usable > as a key to a dictionary. The purpose of using a dictionary is to find > equal rows in some database tables. Perhaps using a md5 hash from the > column data is also an idea ? > > Thanks a lot in advance,
No direct answer, but can't you put the rows into the dict (or a set) without converting them to a string? seen = set() for row in rows: if row in seen: print "dupe" else: seen.add(row) Or, even better, solve the problem within the db: select <fields> from <table> group by <fields> having count(*) > 1 Peter -- http://mail.python.org/mailman/listinfo/python-list