On Fri, Nov 27, 2009 at 3:16 PM, Adrian Buehlmann <adr...@cadifra.com> wrote: > > > On 27.11.2009 18:44, Stefan Rusek wrote: >> Please see >> http://bitbucket.org/tortoisehg/stable/issue/672/shell-extension-unicode-support >> for more context. >> >> I am working to add support for unicode filenames to the shell >> extension. Out of the box, THG currently uses the non-unicode api >> calls in the shell extension. This works because on windows hg uses >> the same non-unicode api calls, and because of it's built on hg, hgtk >> also ends up using the non-unicode api calls. The fixutf8 hg extension >> wraps all of the disk io calls with their unicode equivalent in order >> add support for unicode filenames on windows. With the extension >> enabled, both hg and hgtk work properly with unicode filenames, >> however the shell extension does not. >> >> The plan is to have the TortoiseHg RPC server pass the value of >> mercurial._encoding to tortoisehg via the thgstatus file, so that the >> shell extension knows how to read both the thgstatus and dirstate >> files. >> >> The one issue left to sort out is more of an issue of style. Currently >> std::string is used throughout the shell extension for storing >> filenames and paths. We could switch to std:wstring for all paths, or >> we should do an #ifdef UNICODE and make it so that the shell extension >> could be compiled in either unicode or non-unicode. >> >> I was originally in favor of the #ifdef approach, but there wouldn't >> be any advantage to compiling compiling to non-unicode, since Windows >> uses only unicode under the hood, so when we get filenames from >> windows, the filename gets converted to non-unicode, and then when we >> call the non-unicode version of CreateProcess to spawn hgtk windows >> automatically converts the command-line we pass in to unicode. >> Additionally, it effectively creates two versions of the shell >> extension that would need to be supported. >> > > Given my limited understanding of encoding issues, this statement > may have a high risk of shooting in my own food, but... > > I would say switch to std::wstring for all file paths and don't > use #ifdef UNICODE's. > > The recoding from what's in .hg/dirstate to std::wstring should then > happen when reading the .hg/dirstate into the shell extension's data > structures in memory. > > All file paths in memory should then be assumed to be encoded > in whatever Windows' encoding for wide character string filenames > is. > > It looks like this is UTF-16: > http://msdn.microsoft.com/en-us/library/dd374081(VS.85).aspx > > So care must for example be taken when splitting a path > into its parts (splitting on '\'), which is done when reading > the dirstate in the current code.
I don't see a need to use #ifdefs throughout the code either. I'm guessing we'll want to bundle the fixutf8 extension as well so it can be easily enabled. -- Steve Borho ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Tortoisehg-develop mailing list Tortoisehg-develop@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tortoisehg-develop