doggod wrote: 
> natürlich kenne ich das deutsche ü und sollte mich wirklich daran
> erinnern, dass ä auch verwendet wird. Eigentlich einmal Deutsch gelernt
> ... das war erst vor> 30 Jahren. Vielleicht erklärt das die Sache ...
> Ok I admit, used google translate for that one, most likely very poor
> german? 
> 
Actually, this isn't bad at all...

> 
> One thing though, so if i understand this right, it (this encoding
> "issue") has nothing to do with the Trackstat plug-in?
> But oh shit, just come to think about. You wrote "(in UTF-8, generally
> used by modern file systems) " ...my main library from windows naturally
> is on a NTFS disk BUT my test disk (both are external USB disks) is
> ExFat! Could this be the reason? Should I make me a "testlibrary" on a
> NTFS disk instead? Would that work under linux? For now I use Dietpi
> which is based on Debian.
> 
I glossed over some aspects in my explanation to keep it simple, but I
guess we'll need more details.
The file system (FS) is only part of it, there's also the OS (which will
have a system-wide default encoding) and the application (Perl / LMS /
Trackstat in this case).
So even if the FS in theory could use Unicode (both NTFS and ExFAT can),
the OS still can encode file names differently (Windows-1252 in case of
legacy Microsoft Windows), which will determine how they are saved in
the FS. If such a FS is mounted on a system using a different encoding,
a OS (or more precisely the FS driver) can either convert it (in Linux,
the codepage and nls mount options - that's why you can use a NTFS drive
with Windows-1252 file names event though Linux uses UTF-8 as system
encoding) or fail gracelessly (the Windows approach).
Trackstat on Windows will therefore get file names as Windows-1252
encoded strings, and this is how they are written to the XML files
(percent-encoded, but still). A more portable method would have been to
convert these names from Windows-1252 to UTF-8 before percent-encoding
and writing the XML, which would have made all of this a non-issue.

> 
> I think I understand the thing with different encodings in general
> terms. But from yours ;
> "Blue%2520%25C3%2596yster%2520Cult means c396 (hexadecimal) is Ö (in
> UTF-8, generally used by modern file systems)
> Blue%2520%25D6yster%2520Cult means d6 (hexadecimal) is Ö (in
> Windows-1252, used by legacy components of Microsoft Windows)"
> 
> There's parts in your explanation missing that confuses me. You say
> "c396" but I read "%2520%25C3%2596" from "Blue" to "yster" ...
> I can see there's a C3" but then there's "%25" before the"96" part, how
> does one know that one should read it as "c396"?
> I'm trying hard to see a pattern but sadly can't :-/ Does "%2520" mean
> "space"? What does "%25" before "C3" and "D6" mean?
> 
I assume you have read and understood percent-encoding (see #3).
In the case of the Trackstat XML file, there's another twist: For no
apparent reason, Trackstat encodes the file name twice (but the folder
name only once).
So decoding %2520%25C3%2596 once gives %20%C3%96 (since %25 is literally
%), decoding it a second time gives " Ö" since %20 is space, %C3%96 is Ö
(percent-encoding uses one %XX stanza per byte, but UTF-8 can be
multi-byte, so these two (C3 and 96) are combined into c396)

> 
> Is there any converter tool that can be used on a *.xml text file with
> Windows-1252 encoded file URL's that convert it to UTF-8? 
> 
Not that I know of.

> 
> Hmm, maybe I’m asking the wrong question? Guess what I first and
> foremost would like to understand is exactly what is it that makes (if
> we stay with the example Ö) An “ö” appear as “25D6” in the Trackstat on
> Windows created xml backup file and respectively like “25C3%2596”on the
> linux install? 
> Is it only because of two different OS’s? 
> 
OSes that use different default system encodings, and an application
(Trackstat) that doesn't use a cross-platform file format.

> 
> Can any setting make a change to this? 
> 
Not that I know of.

> 
> Or is it just as simple as there’s no way around it other than the long
> and hard way of search and edit the *xml backup file? Also, can it mean
> that basically there could be more or other "characters/letters" that is
> not interpreted correctly by LMS/trackstat Linux install?
> 
Probably any letter not in the English alphabet (unlauts, accents,
etc.)

> 
> If so I think I give up my linux project.
> 
Well, umlauts are only 6 search/replace operations. It'll depend if you
also have accents, nordic characters etc.
Also note that this is not really about Linux, it will affect you when
you migrate away from legacy Windows (possibly even if the destination
is a newer Windows, see 'here'
(https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows) - not sure
about that)

Personally, I simply avoid any non-English characters in file names for
files which may end up somewhere else (e.g. car audio systems etc.),
although nowadays UTF-8 is mostly a safe bet.



'Various SW' (https://www.nexus0.net/pub/sw/): Web Interface | Playlist
Editor / Generator | Music Classification | Similar Music | Announce |
EventTrigger | LMSlib2go | ...
'Various HowTos' (https://www.nexus0.net/pub/documents/LMS/): build a
self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...
------------------------------------------------------------------------
Roland0's Profile: http://forums.slimdevices.com/member.php?userid=56808
View this thread: http://forums.slimdevices.com/showthread.php?t=112495

_______________________________________________
unix mailing list
[email protected]
http://lists.slimdevices.com/mailman/listinfo/unix

Reply via email to