As we know with LC it is pretty straightforward to deal with internationalised text for remote databases and unknown user platforms by conversion to utf-8. But I have come across a problem with Linux filenames containing non-ascii characters which has me befuddled.
My many-years-old app has until now just required all filenames to be in standard 7-bit ascii, so it was way past time I brought it up to date. The app talks to a database, media and web site on a unix (DreamHost) server using LC server as intermediary. I create a file say “Carré.txt” on a Mac - the non-ascii character in that name being [e-acute] - I shall use this convention from now on to ensure what is displayed here on the forum is understood. BTW, as far as I can determine that character in the Mac file system is a single byte hex [8e], the classic MacRoman encoding, not its utf-8 2-byte [C3A9] encoding. So I don’t understand how macOS handles unicode in its filesystem, which it certainly does. We are exhorted to textEncode to utf-8 when exporting anything outside LC but perhaps not filenames?? If I textEncode the filename and save with that name I get a new file “Carr[squareroot copyright].txt”. I am befuddled already - how does macOS distinguish MacRoman encoding from unicode encoding when it displays a file name? - but that is another story for another place.. Oh, and another story: it ain't true that all text in LC is utf-16: While it’s not possible using LC-API’s to determine exactly what is inside the black-box of an LC variable in memory, it is evidently platform dependent — that MacRoman [8e] is reported as being the relevant byte in the LC variable. What can be determined is what is on disk when a stack is saved: there text appears to be encoded as a mixture of 7-bit ascii when it can be, utf-16 encoding for other characters. Not that we as consumers need to know how the magic is performed, as long as it works. Back to my story.. So now I want to upload this file to my remote Linux server. I POST a form, prepared with libURLMultiPartFormData, to an LC Server script, which is supposed to save the received file. If I attempt to use the original Mac file name, the server responds “Cannot open file Carr[e-acute].txt” (this is the Result error message from "open file tFileName for binary write”) If I send textEncode(filename, utf-8) as the file name, the server responds “Cannot open file Carr[squareroot][copyright].txt” If I textEncode at the client end, and then textDecode on the server it responds “Cannot open file Carre[E-grave].txt” (Where did THAT come from? Is there a bug in textDecode on Linux LCS? The native encoding on Linux is supposed to be ISO-Latin-1, where E-grave is hex [C8], in MacRoman it is [E9], no apparent connections between them or the utf-8 bytes.) And just as a piece of nonsense, if I send the raw un-Encoded Mac file name, but then textDecode on the server, the file is happily saved as “Carr.txt”, which is correct since [8e] followed by . is illegal as utf-8, so the [e-acute] is just skipped by textDecode. Could it be that LCserver cannot create files on Linux with non-ascii names?!? That doesn’t seem believable. I can of course directly create files on the server with non-ascii characters such as e-acute. Either I am missing something, or surely our European users have seen this already, so someone should be able to unfuddle me! Neville Smythe _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode