On Mon, 27 Apr 2015 15:13:46 +0200, Mike Hearn <m...@plan99.net> wrote:

Thus this may not be a bug in Java so much as a design problem/oversight
with the operating systems themselves.

Note that the issue you're running in to is *not* to do with encodings.
It's not a UTF-8 vs UTF-16 type issue. Rather, the issue is that Unicode
allows visually identical strings to be represented differently at the
logical layer, using different sequences of code points.

Yes, I understand.


You didn't say what app originally saved the files. However, what exact

They were rsynced from Mac OS X. Actually I thought it could be related to the piece of software that brought the file on the RPI, but in the end - thinking in general - a user could transfer the files in either way, and I must be able to deal with them.

sequence of code points you get on disk for a given piece of human readable
text can depend on things as varying as what input method editor the user
typed the file name with, precisely what combination of keys they pressed
and when, what libraries the app used, and so on.

Yes it's a mess.

If you encounter such situations frequently then your best bet may be to
simply write a little wrapper that tries different normalisations until it
finds one that works.

I feared that. In the end it might be even reasonably doable, if I can take advantage of some preconditions... for instance: is it safe to assume that, given a specific instance of a filesystem, everything is encoded/normalised in the same way? In this case I could just run a quick test at the start of the application, find once for all the correct normalisation, and then always apply the same. Otherwise, I have to try all the combinations for every file that I open...

--
Fabrizio Giudici - Java Architect @ Tidalwave s.a.s.
"We make Java work. Everywhere."
http://tidalwave.it/fabrizio/blog - fabrizio.giud...@tidalwave.it

Reply via email to