> On Dec. 20, 2016, 12:29 a.m., Daniel Pravat wrote:
> > 3rdparty/stout/include/stout/windows/os.hpp, line 748
> > <https://reviews.apache.org/r/54877/diff/1/?file=1589098#file1589098line748>
> >
> >     I don't think the conversion to UTF-8 is appropiate here.
> 
> Andrew Schwartzmeyer wrote:
>     What would you convert it to? It's currently in UTF-16, and Windows paths 
> are allowed to have (almost) any Unicode character.
> 
> Alex Clemmer wrote:
>     I think he's saying that bad things will happen if you use strings that 
> have Unicode in them, so it's probably better to just error out.
> 
> Andrew Schwartzmeyer wrote:
>     I'm not following... That is a case our codebase must deal with, NTFS 
> paths are stored in Unicode:
>     
>     
> https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748(v=vs.85).aspx
>     
>     and on top of that, Windows file paths can:
>     
>     > Use any character in the current code page for a name, including 
> Unicode characters and characters in the extended character set (128–255)
>     
>     
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx
>     
>     So what I'm saying is that bad things _must not happen_ if we have paths 
> with non-ASCII characters in them; we need to handle that at any point we 
> deal with file paths on Windows.
>     
>     I'm not saying this function is likely to return a path like `C:\ÂÑÐ¥`, 
> but that would be a valid file path, and so we must be treating all our paths 
> on Windows as Unicode.
>     
>     If this is not the case, are purposefully constraining ourselves to 
> ASCII, and if so, why?
> 
> Alex Clemmer wrote:
>     Sorry, I meant bad things will happen to Mesos if you give it strings 
> with unicode in them. Mesos itself does not deal with Unicode gracefully.
> 
> Andrew Schwartzmeyer wrote:
>     Oooh. We should probably fix that.
> 
> Andrew Schwartzmeyer wrote:
>     Alex, Daniel, do we have a resolution on this? I spoke with Alex and we 
> feel that we should be doing two main things: avoid exacberating the problem 
> in MESOS-6817 by being explicit in our use of `W` APIs, rather than relying 
> on the mercurial value of `TCHAR`; and not attempt to guard against paths 
> that Mesos does not handle, as it is a bug that Mesos does not handle paths 
> with valid extended characters (since both Linux and Windows may have these 
> paths), so failing here would be inappropriate as it hides the actual bug. 
> (Not to speak for you Alex, I think this summarizes our discussion yesterday.)
> 
> Alex Clemmer wrote:
>     If I'm understanding our discussion correctly, failing here on unicode 
> conversion would be preferable, since it makes a subtle bug more obvious. If 
> that's nontrivial I think we can just ship it as is, since Mesos already has 
> these problems.
> 
> Andrew Schwartzmeyer wrote:
>     > failing here on unicode conversion
>     
>     What would fail here?
>     
>     I *think* you're implying to add a detection algorithm to see if there 
> are any characters in the path that Mesos does not believe are valid; but 
> that is not "failing on Unicode", we'd need to write our own validator for 
> Mesos.
> 
> Alex Clemmer wrote:
>     I was just saying, if there's an easy way to try to interpret the string 
> as normal ASCII and return `Error` if that fails, we should do that. If not, 
> it's not that big of a deal, because Mesos will probably fail anyway.
> 
> Andrew Schwartzmeyer wrote:
>     Ah I see what you want: 
>     > interpret the string as normal ASCII and return Error if that fails
>     
>     but I don't believe this is possible. To be clear, the non-Unicode 
> versions of these APIs do not return ASCII; they return the same data encoded 
> with the current Windows code page (which is a superset of ASCII, in much the 
> same way as UTF-8). If we used this version, we would not only hit the same 
> problem (possible non-ASCII characters), but also introduce a new problem of 
> having to decode from an arbitrary Windows code page (rather than UTF-16 
> which is at least standard). With either the short or wide version of the 
> API, detecting if the string is not ASCII would mean implmenenting a 
> validator, which I don't think we want to do here (it's the wrong solution to 
> the problem).
>     
>     That said, what would you suggest?
>     
>     P.S. For what it's worth, I don't expect this particular API to give us 
> non-ASCII characters, but I think we should choose a pattern for dealing with 
> Windows APIs and stick to it. This pattern gives us guaranteed UTF-8 strings 
> with whatever data we're expected to handle.
> 
> Alex Clemmer wrote:
>     I suggest just punting on returning `Error` if the conversion fails, 
> since Mesos does not handle unicode anyway. Daniel's suggestion to avoid the 
> conversion to unicode should probably also be adopted.

Summarizing offline discussion:

The explicit use of `GetAllUsersProfileDirectoryW` for UTF-16 to UTF-8 is being 
kept, because the only alternative is converting Windows code page to UTF-8 (or 
not convert it at all which leaves us with more problems). Either conversion 
(or no conversion at all) may leave us with non-ASCII characters in the path, 
which Mesos has a known issue with. The Unicode version of the function is 
preferred to the Windows code page version because a) the latter is deprecated 
for the former by Windows, and b) the Unicode conversion is built-into C++, 
while we do not have a conversion algorithm for Windows code page; and it is 
preferred to not reencoding at all so that we don't have random Windows code 
page encodings in the code base.

By converting to UTF-8, we're left with the status quo of Mesos on Linux: UTF-8 
encoded strings, which may or may not have non-ASCII characters in them.

It's been suggested to validate that our paths have only ASCII, and fail 
gracefully; however, that is not implemented in this patch as it is a separate 
issue. If it's decided it should be done, we'll fix the whole code base instead 
of this one path.


- Andrew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54877/#review159672
-----------------------------------------------------------


On Jan. 3, 2017, 10:14 p.m., Andrew Schwartzmeyer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54877/
> -----------------------------------------------------------
> 
> (Updated Jan. 3, 2017, 10:14 p.m.)
> 
> 
> Review request for mesos, Daniel Pravat, Alex Clemmer, and Joseph Wu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The API `SHGetKnownFolderPath` requires `Shell32.dll`,
> which is not available on Nano server.
> The equivalent API `GetAllUsersProfileDirectory`
> only requires `Userenv.dll`, which is available on Nano.
> 
> This API is also friendlier, as we own the allocation.
> 
> The Unicode version `GetAllUsersProfileDirectoryW` is
> explicitly used so that we are guaranteed a Unicode path,
> which we then convert from UTF-16 to UTF-8,
> instead of using the ANSI version which depends on a
> varying Windows code-page, and is not recommended.
> 
> A `vector<wchar_t>` is used over a `wstring` to avoid dealing
> with the placement of the null-terminating character.
> 
> 
> Diffs
> -----
> 
>   3rdparty/stout/include/stout/windows.hpp 
> e641c46d033372e1b6c9f9c066b1ad4957d55088 
>   3rdparty/stout/include/stout/windows/os.hpp 
> 5cd92545a49648e39e8eb7cf131895e9cfc97902 
> 
> Diff: https://reviews.apache.org/r/54877/diff/
> 
> 
> Testing
> -------
> 
> cmake && msbuild, attach agent to master and check default `runtime_dir` 
> value.
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>

Reply via email to