Saving files on external discs and moving the discs between
Windows/Mac/Linux can bring lots of character set issues, have seen
this more often than not.
-Windows- machines write filenames in the machines local encoding
(often cp-1252, at least in europe) to FAT-type filesystems, UTF-16 to
NTFS filesystems. Furthermore, in Win32 namespace, one can use any
UTF-16 code unit (case insensitive) except U+0000 (NUL) / (slash) \
(backslash) : (colon) * (asterisk) ? (Question mark) " (quote) < (less
than) > (greater than) and | (pipe), while in POSIX namespace, all
characters except except U+0000 (NUL) and / (slash) are allowed.
-Linux- machines usually write filenames in the machines local
encoding to all types of file systems, including FAT and FAT32, though
I believe the NTFS-3G filesystem drivers remap to UTF-16 internally.
With non-ASCII characters in filenames, this can often lead to enormous
»odd« problems: Getting UTF-8 encoded names after transporting a drive
from Linux to Windows, or getting undecodable cp-1252 encoded filenames
in Linux when Linux is set to use UTF-8 internally and you mount a drive
that has been written to by Windows machines.
I believe that your case would be the latter: Having the files written
by a Windows machine onto a FAT32 filesystem, mounting that on a Linux
machine that uses UTF-8
and promptly the Linux machine will be unable
to decode characters above 0x7f in filenames
In such cases, I usually recommend to copy the files via network, using
a Samba setup that has correct »charset« settings in smb.conf.
For the more adventurous types, it might be worth trying to use a
NTFS-formatted external drive for swapping data between Windows and
Linux, because the (relatively new) NTFS-3G filesystem driver is able
to read DOS, Win32 and POSIX namespaces correctly from NTFS-formatted
discs. When -writing- to the disc, it will -only- write in the POSIX
namespace which is perfectly legal for Win32 systems, though some
applications might get confused. -Plus- you MUST be sure not to write
characters into filenames that are perfectly legal in POSIX namespace
but not in Win32 namespace, i.e. »\:*?« and so on. Also be aware that
the POSIX namespace for NTFS is -case-sensitive-, like Linux
filesystems.
A few words of caution:
- Use NTFS-3G with a little caution. It is quite new and -might- have
a few bugs still.
- Trying to re-tag on Linux might -not- help, since some of the
filenames using extended characters will be illegal in Linux with
UTF-8.
- Setting your Linux machine to the same internal character encoding
that your Windows system has will most probably break other things
(like SC and MusicIP), because many applications rely on UTF-8
support.
Another, more easy solution for the not-so-adventurous types might be:
- Connect the external drive back to your Windows system.
- Rename all files to contain only -ASCII- characters (-not ANSI- or
-ISO-8859-x-!), i.e. »0-9,A-Z,a-z,_«.
- Connect the external drive back to the Linux machine.
- (Optional) Rename all files however you want them under Linux,
using UTF-8 encoding. (-Warning:- This will make files containing
extended characters break under Windows if the disc is ever
transported back.)
--
Moonbase
Moonbase: 'The Problem Solver' (http://www.kaufen-ist-toll.de/moonbase)
------------------------------------------------------------------------
Moonbase's Profile: http://forums.slimdevices.com/member.php?userid=21594
View this thread: http://forums.slimdevices.com/showthread.php?t=60157
_______________________________________________
unix mailing list
[email protected]
http://lists.slimdevices.com/mailman/listinfo/unix