>> Many database engines are encoding-aware, and distinguish between >> 'text' columns and 'blob' columns -- the latter are arbitrary bags >> of bytes, but text columns store text, and a good database (with a >> sensibly designed database) will be aware of this and handle >> encoding and decoding of text responsibly.
Ok, by this definition, the dbm interface of Unix is not a good database. Tough luck. >> I can tell you that in REALbasic, if your database is properly >> configured to use UTF-8 encoding, the rest is all handled >> seamlessly -- you just store and retrieve text, and don't have to >> worry about encoding and decoding things all over the place. In Python, the database system is independent of the programming language. Python can deal with >> So the OP's request is quite valid. Which of the questions specifically? Q: Can you put UTF-8 characters in a dbhash in python 2.5 ? A: Sure, certainly. Q: Do I need to change the bsd db library, or there is no way to make it work with python 2.5 ? A: You don't need to change the bsd db library; it works out of the box. Q: What about python 2.6 ? A: It's the same. He got essentially the answers to the questions he asked. >> Python's handling of encodings is currently primitive compared to >> some other environments, and I see that this extends to the >> database modules. That's *not* a question that he had asked. He asked about UTF-8, but perhaps meant to ask about Unicode (in particular as his example did demonstrate any problems with UTF-8 encoded strings). >> Fine, fair enough, it is what it is, but there is no harm in asking >> about (or even yearning for) a more intelligent system that does >> more of the grunt work for us. It *is* important to understand the difference between an "UTF-8 string", and a "Unicode string". If the OP hadn't been confused about the two, and fully understood the difference, he probably wouldn't have needed to ask. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
