Re: JDK 1.8.0 33/40, diacritics and file problems

Fabrizio Giudici Tue, 28 Apr 2015 06:08:18 -0700

On Mon, 27 Apr 2015 15:13:46 +0200, Mike Hearn <m...@plan99.net> wrote:

Thus this may not be a bug in Java so much as a design problem/oversight
with the operating systems themselves.

Note that the issue you're running in to is *not* to do with encodings.
It's not a UTF-8 vs UTF-16 type issue. Rather, the issue is that Unicode
allows visually identical strings to be represented differently at the
logical layer, using different sequences of code points.


Yes, I understand.


You didn't say what app originally saved the files. However, what exact

They were rsynced from Mac OS X. Actually I thought it could be related tothe piece of software that brought the file on the RPI, but in the end -thinking in general - a user could transfer the files in either way, and Imust be able to deal with them.

sequence of code points you get on disk for a given piece of humanreadable
text can depend on things as varying as what input method editor the user
typed the file name with, precisely what combination of keys they pressed
and when, what libraries the app used, and so on.

Yes it's a mess.

If you encounter such situations frequently then your best bet may be to
simply write a little wrapper that tries different normalisations untilit
finds one that works.

I feared that. In the end it might be even reasonably doable, if I cantake advantage of some preconditions... for instance: is it safe to assumethat, given a specific instance of a filesystem, everything isencoded/normalised in the same way? In this case I could just run a quicktest at the start of the application, find once for all the correctnormalisation, and then always apply the same. Otherwise, I have to tryall the combinations for every file that I open...


--
Fabrizio Giudici - Java Architect @ Tidalwave s.a.s.
"We make Java work. Everywhere."
http://tidalwave.it/fabrizio/blog - fabrizio.giud...@tidalwave.it

Re: JDK 1.8.0 33/40, diacritics and file problems

Reply via email to