Hi Zooko, I finally managed to find enough time today to investigate this issue further on. Basically test_unicode_filename raise the issue of strings which are not being converted as expected.
As Brian pointed out in [1], the current codebase is calling simplejson.dumps with bytestrings coming from the command line. This might sometimes work but is definitely not recommended. The same kind of issues appears with UTF-8 filenames with the FTP or SFTP server. We usually have UTF-8 bytestrings as input (sys.argv, filenames, aliases, etc.) and need UTF-8 bytestrings as output (urls, filenames, etc.). However, it is usually simpler and safer to use unicode strings internally. Kumar McMillan gives the following advise in his talk [2]. 1. Decode early 2. Unicode everywhere 3. Encode late and to create wrappers for libraries which not unicode compliant (urllib for example). Does it sound coherent in the context of tahoe ? If so, the question is where are the best places to handle theses conversions ? Should we (1) automatically convert sys.argv[] from bytestring to unicode in runner.runner(), or (2) do it selectively for each command (put, cp, etc.). I gave a try to (1), see patch [3], which indeed fixed the test failure on slave3 (dapper box). However, it broke many tests at the same time, mostly assertions in util/base32.py which seems to require bytestrings instead of unicode strings. François [1] http://allmydata.org/trac/tahoe/ticket/534#comment:31 [2] http://farmdev.com/talks/unicode/ [3] --- old-tahoe/src/allmydata/scripts/runner.py 2008-12-22 07:33:51.000000000 -0800 +++ new-tahoe/src/allmydata/scripts/runner.py 2008-12-22 07:33:52.000000000 -0800 @@ -33,6 +33,12 @@ stdin=sys.stdin, stdout=sys.stdout, stderr=sys.stderr, install_node_control=True, additional_commands=None): + # Convert arguments to unicode + new_argv = [] + for arg in argv: + new_argv.append(arg.decode('utf-8')) + argv = new_argv + config = Options() if install_node_control: config.subCommands.extend(startstop_node.subCommands) _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
