doggod wrote: > natürlich kenne ich das deutsche ü und sollte mich wirklich daran > erinnern, dass ä auch verwendet wird. Eigentlich einmal Deutsch gelernt > ... das war erst vor> 30 Jahren. Vielleicht erklärt das die Sache ... > Ok I admit, used google translate for that one, most likely very poor > german? > Actually, this isn't bad at all...
> > One thing though, so if i understand this right, it (this encoding > "issue") has nothing to do with the Trackstat plug-in? > But oh shit, just come to think about. You wrote "(in UTF-8, generally > used by modern file systems) " ...my main library from windows naturally > is on a NTFS disk BUT my test disk (both are external USB disks) is > ExFat! Could this be the reason? Should I make me a "testlibrary" on a > NTFS disk instead? Would that work under linux? For now I use Dietpi > which is based on Debian. > I glossed over some aspects in my explanation to keep it simple, but I guess we'll need more details. The file system (FS) is only part of it, there's also the OS (which will have a system-wide default encoding) and the application (Perl / LMS / Trackstat in this case). So even if the FS in theory could use Unicode (both NTFS and ExFAT can), the OS still can encode file names differently (Windows-1252 in case of legacy Microsoft Windows), which will determine how they are saved in the FS. If such a FS is mounted on a system using a different encoding, a OS (or more precisely the FS driver) can either convert it (in Linux, the codepage and nls mount options - that's why you can use a NTFS drive with Windows-1252 file names event though Linux uses UTF-8 as system encoding) or fail gracelessly (the Windows approach). Trackstat on Windows will therefore get file names as Windows-1252 encoded strings, and this is how they are written to the XML files (percent-encoded, but still). A more portable method would have been to convert these names from Windows-1252 to UTF-8 before percent-encoding and writing the XML, which would have made all of this a non-issue. > > I think I understand the thing with different encodings in general > terms. But from yours ; > "Blue%2520%25C3%2596yster%2520Cult means c396 (hexadecimal) is Ö (in > UTF-8, generally used by modern file systems) > Blue%2520%25D6yster%2520Cult means d6 (hexadecimal) is Ö (in > Windows-1252, used by legacy components of Microsoft Windows)" > > There's parts in your explanation missing that confuses me. You say > "c396" but I read "%2520%25C3%2596" from "Blue" to "yster" ... > I can see there's a C3" but then there's "%25" before the"96" part, how > does one know that one should read it as "c396"? > I'm trying hard to see a pattern but sadly can't :-/ Does "%2520" mean > "space"? What does "%25" before "C3" and "D6" mean? > I assume you have read and understood percent-encoding (see #3). In the case of the Trackstat XML file, there's another twist: For no apparent reason, Trackstat encodes the file name twice (but the folder name only once). So decoding %2520%25C3%2596 once gives %20%C3%96 (since %25 is literally %), decoding it a second time gives " Ö" since %20 is space, %C3%96 is Ö (percent-encoding uses one %XX stanza per byte, but UTF-8 can be multi-byte, so these two (C3 and 96) are combined into c396) > > Is there any converter tool that can be used on a *.xml text file with > Windows-1252 encoded file URL's that convert it to UTF-8? > Not that I know of. > > Hmm, maybe Im asking the wrong question? Guess what I first and > foremost would like to understand is exactly what is it that makes (if > we stay with the example Ö) An ö appear as 25D6 in the Trackstat on > Windows created xml backup file and respectively like 25C3%2596on the > linux install? > Is it only because of two different OSs? > OSes that use different default system encodings, and an application (Trackstat) that doesn't use a cross-platform file format. > > Can any setting make a change to this? > Not that I know of. > > Or is it just as simple as theres no way around it other than the long > and hard way of search and edit the *xml backup file? Also, can it mean > that basically there could be more or other "characters/letters" that is > not interpreted correctly by LMS/trackstat Linux install? > Probably any letter not in the English alphabet (unlauts, accents, etc.) > > If so I think I give up my linux project. > Well, umlauts are only 6 search/replace operations. It'll depend if you also have accents, nordic characters etc. Also note that this is not really about Linux, it will affect you when you migrate away from legacy Windows (possibly even if the destination is a newer Windows, see 'here' (https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows) - not sure about that) Personally, I simply avoid any non-English characters in file names for files which may end up somewhere else (e.g. car audio systems etc.), although nowadays UTF-8 is mostly a safe bet. 'Various SW' (https://www.nexus0.net/pub/sw/): Web Interface | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | LMSlib2go | ... 'Various HowTos' (https://www.nexus0.net/pub/documents/LMS/): build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ... ------------------------------------------------------------------------ Roland0's Profile: http://forums.slimdevices.com/member.php?userid=56808 View this thread: http://forums.slimdevices.com/showthread.php?t=112495
_______________________________________________ unix mailing list [email protected] http://lists.slimdevices.com/mailman/listinfo/unix
