On Fri, Nov 27, 2009 at 3:16 PM, Adrian Buehlmann <adr...@cadifra.com> wrote:
>
>
> On 27.11.2009 18:44, Stefan Rusek wrote:
>> Please see 
>> http://bitbucket.org/tortoisehg/stable/issue/672/shell-extension-unicode-support
>> for more context.
>>
>> I am working to add support for unicode filenames to the shell
>> extension. Out of the box, THG currently uses the non-unicode api
>> calls in the shell extension. This works because on windows hg uses
>> the same non-unicode api calls, and because of it's built on hg, hgtk
>> also ends up using the non-unicode api calls. The fixutf8 hg extension
>> wraps all of the disk io calls with their unicode equivalent in order
>> add support for unicode filenames on windows. With the extension
>> enabled, both hg and hgtk work properly with unicode filenames,
>> however the shell extension does not.
>>
>> The plan is to have the TortoiseHg RPC server pass the value of
>> mercurial._encoding to tortoisehg via the thgstatus file, so that the
>> shell extension knows how to read both the thgstatus and dirstate
>> files.
>>
>> The one issue left to sort out is more of an issue of style. Currently
>> std::string is used throughout the shell extension for storing
>> filenames and paths. We could switch to std:wstring for all paths, or
>> we should do an #ifdef UNICODE and make it so that the shell extension
>> could be compiled in either unicode or non-unicode.
>>
>> I was originally in favor of the #ifdef approach, but there wouldn't
>> be any advantage to compiling compiling to non-unicode, since Windows
>> uses only unicode under the hood, so when we get filenames from
>> windows, the filename gets converted to non-unicode, and then when we
>> call the non-unicode version of CreateProcess to spawn hgtk windows
>> automatically converts the command-line we pass in to unicode.
>> Additionally, it effectively creates two versions of the shell
>> extension that would need to be supported.
>>
>
> Given my limited understanding of encoding issues, this statement
> may have a high risk of shooting in my own food, but...
>
> I would say switch to std::wstring for all file paths and don't
> use #ifdef UNICODE's.
>
> The recoding from what's in .hg/dirstate to std::wstring should then
> happen when reading the .hg/dirstate into the shell extension's data
> structures in memory.
>
> All file paths in memory should then be assumed to be encoded
> in whatever Windows' encoding for wide character string filenames
> is.
>
> It looks like this is UTF-16:
> http://msdn.microsoft.com/en-us/library/dd374081(VS.85).aspx
>
> So care must for example be taken when splitting a path
> into its parts (splitting on '\'), which is done when reading
> the dirstate in the current code.

I don't see a need to use #ifdefs throughout the code either.  I'm
guessing we'll want to bundle the fixutf8 extension as well so it can
be easily enabled.

--
Steve Borho

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Tortoisehg-develop mailing list
Tortoisehg-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tortoisehg-develop

Reply via email to