Re: Linux filenames in LC Server

2023-08-17 Thread Matthias Rebbe via use-livecode
Neville,

The htaccess solution would also not harm other apps because the environment 
variable is only set when executing .lc files and they’re only used within that 
session. 

Von meinem iPhone gesendet

> Am 18.08.2023 um 03:07 schrieb Neville Smythe via use-livecode 
> :
> 
> Hi Matthias, I didn’t see your post until now. I did wonder if .htaccess 
> could be used using Rewrite rules, but I couldn’t get my head around the 
> documentation. 
> 
> So it’s good to know both methods work. We are running other apps on the 
> website so I wonder a bit if setting the environment variables for everything 
> running under apache might have some side effects, so I’ll stick with Mark’s 
> launch script method.
> 
> Neville Smythe
> 
> 
> 
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Linux filenames in LC Server

2023-08-17 Thread Neville Smythe via use-livecode
Hi Matthias, I didn’t see your post until now. I did wonder if .htaccess could 
be used using Rewrite rules, but I couldn’t get my head around the 
documentation. 

So it’s good to know both methods work. We are running other apps on the 
website so I wonder a bit if setting the environment variables for everything 
running under apache might have some side effects, so I’ll stick with Mark’s 
launch script method.

Neville Smythe




___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Linux filenames in LC Server - Resolved

2023-08-17 Thread matthias rebbe via use-livecode
Hi Neville,

did you read my comment about setting the environment variable lang using 
.htaccess?
That worked here and i could write non-ascii filenames using the "standard" LC 
Server installation.

Regards,
Matthias

> Am 17.08.2023 um 15:56 schrieb Neville Smythe via use-livecode 
> :
> 
> Thank you Mark, installing the launcher script to set environment variables 
> has fixed all my issues with non-ascii filenames. The documentation for 
> installing LCS could usefully make a note of these settings. And indeed 
> TextEncode/Decode both work as expected, my musings there were irrelevant.
> 
> Do you expect a future version of LCServer will incorporate the changes? IMHO 
> I’d agree option 2 would make things work more transparently for the vast 
> majority of cases. And while you wouldn’t able be able to handle badly 
> encoded filenames within LC, I guess you wouldn't be able to create them 
> either.
> 
> Neville Smythe
> 
> 
> 
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Linux filenames in LC Server - Resolved

2023-08-17 Thread Neville Smythe via use-livecode
Thank you Mark, installing the launcher script to set environment variables has 
fixed all my issues with non-ascii filenames. The documentation for installing 
LCS could usefully make a note of these settings. And indeed TextEncode/Decode 
both work as expected, my musings there were irrelevant.

Do you expect a future version of LCServer will incorporate the changes? IMHO 
I’d agree option 2 would make things work more transparently for the vast 
majority of cases. And while you wouldn’t able be able to handle badly encoded 
filenames within LC, I guess you wouldn't be able to create them either.

Neville Smythe




___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Linux filenames in LC Server

2023-08-16 Thread matthias rebbe via use-livecode
It seems it is possible to set environment vars  using redirect rules in 
.htaccess.

I added the following lines to my .htaccess

RewriteEngine on
RewriteRule \.(lc) - [E=LANG:de_DE.UTF-8]

the 2nd line or better tells Apache not to redirect, but to 'use'  the flag 
[E=]  when an .lc file is "requested".
The flag allows to set an environment variable.  E=LANG:de_DE.UTF-8
tells apache to set the variable LANG  to the value de_DE.UTF-8
It's even possible to set cookies that way using a cookie tag.

With those 2 lines i did not receive the 'can't open file'  error anymore and 
the file with a non-ascii filename, in my script testä.txt, was created by the 
.lc script.
In my ftp client the file testä.txt is shown as testä.txt, but i can access 
the file from LCserver still with its original name testä.txt



> Am 16.08.2023 um 09:34 schrieb Mark Waddingham via use-livecode 
> :
> 
> On 2023-08-16 06:37, Neville Smythe via use-livecode wrote:
>> So I misunderstood, I thought we were talking about Apache environment 
>> variables. Indeed the Terminal app reports
>> LANG=C
>> as a system env variable. But if this is not specifically a server problem, 
>> wouldn’t
>> that mean we could see the same behaviour with LC Desktop on Linux machines 
>> running
>> vanilla Ubuntu or Debian (which is what Dreamhost uses)? I haven’t tried 
>> this yet,
>> as it is a bit of pain to fire up my Linux emulator machine.
> 
> So the situation here is similar to that which you get on macOS. If you open 
> Terminal, then the (UNIX) environment (variable-wise) which you get will be 
> different from that you get when you double-click on an app to launch it. In 
> the latter case, the executable is launched via the desktop environments 
> 'launcher' process and will inherit the environment provided by that. 
> Presumably, as Linux desktops mandate various things (like language 
> settings), the locale and environment vars will be set appropriately.
> 
>> An experiment, which make me wonder if this counts as a configuration 
>> problem or an actual bug in LC Server:
>> In Terminal I type (actually paste) and execute
>> echo “éü” > Carré.txt
>>(for Forum users like me who just see ? everywhere, that is 
>> [e-acute][u-umlaut][happyface emoji] in the content to be written to a file 
>> with [e-acute] in its name)
>>   This works without problem. The contents of the file are utf-8 encoded, 
>> which I didn’t
>> need to specify, but I guess that is what the pasteboard provided. Terminal 
>> had no problem
>> creating or finding the file without needing those env settings. Of course 
>> it cannot *display*
>> the file name without knowing the encoding, so ls reports the filename as 
>> 'Carr'$'\303\251''.txt’
>> ( readable as an ascii encoding, though not one I have seen before; note the 
>> single quotes)
> 
> I'm guessing here that this is a remote ssh session to your Linux server, and 
> you are using macOS Terminal app to run and connect? If that is the case then 
> the reason this works is because Terminal on macOS is UTF-8 (which is the 
> *only* encoding macOS supports in its UNIX subsystem so you don't get the 
> variance problem you do with Linux). This means that pasting text from 
> somewhere else will paste the UTF-8 bytes - i.e. they will get transmitted 
> over SSH to the remote linux machine.
> 
> As filenames are just sequences of bytes on Linux this works fine - however 
> when you ask the remote terminal to list the files, it can only interpret the 
> ascii chars (as the LANG is C) and thus emits octal escapes for the others - 
> here this ix 0xC3 0xA9 which is the utf-8 encoding of e-acute.
> 
>> If I setup the env variables Mark suggests in the Terminal session
>> export LC_ALL="en_US.UTF8"
>> export LANG=“en_US.UTF8”
>> then Terminal is able to display the filename á la française.
> 
> So now the remote terminal knows how to interpret the sequences of bytes 
> present in the filenames, and thus can emit them appropriately.
> 
>> Cyberduck reports this filename correctly using the [e-acute] without having 
>> to set encoding
>> knowledge. And I can also create the file using Cyberduck with no problems. 
>> So IT knows about/expects/sets
>> up the encoding as needed. I bet other Linux-aware apps would also open or 
>> list such files without
>> drama or special configuration.
> 
> IT doesn't know - it assumes. I suspect that if you used Cyberduck to connect 
> to a Linux server which is setup to *not* be utf-8 (so filenames are encoded 
> with some other encoding), then it would display things incorrectly.
> 
> Of course, if the protocols it deals with specify the text encoding as utf-8 
> *and* the daemons running on said server are setup correctly (i.e. so that 
> they process the filenames and such relative to the server's encoding) *and* 
> they correctly convert the filenames from that encoding to the encoding 
> mandated by the protocol then it would display fine.
> 
> Certainly FTP treats 

Re: Linux filenames in LC Server

2023-08-16 Thread Bob Sneidar via use-livecode
Hah! This reminds me of a time when Windows would allow files with names that 
were legal for Mac to be written to its own file system who’s filenames were 
NOT legal. The result is you could save the files, but you could never access 
or delete them, neither could you delete any folder that contained them. They 
had to write utilities to rename such files and they weren't 100% successful.

Bob S


On Aug 16, 2023, at 12:34 AM, Mark Waddingham via use-livecode 
 wrote:

I suspect (2) is overall better - its only downside is that you would not be 
able to manipulate files on the server which had badly encoded utf-8 names. 
However, that seems like an extreme edge case; and one which you could work 
around by just setting the LANG env var to a native encoding and put 
appropriate code in your app to deal with.

Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ 
http://www.livecode.com/
LiveCode: Build Amazing Things

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Linux filenames in LC Server

2023-08-16 Thread Mark Waddingham via use-livecode

On 2023-08-16 06:37, Neville Smythe via use-livecode wrote:
So I misunderstood, I thought we were talking about Apache environment 
variables. Indeed the Terminal app reports


LANG=C

as a system env variable. But if this is not specifically a server 
problem, wouldn’t
that mean we could see the same behaviour with LC Desktop on Linux 
machines running
vanilla Ubuntu or Debian (which is what Dreamhost uses)? I haven’t 
tried this yet,

as it is a bit of pain to fire up my Linux emulator machine.


So the situation here is similar to that which you get on macOS. If you 
open Terminal, then the (UNIX) environment (variable-wise) which you get 
will be different from that you get when you double-click on an app to 
launch it. In the latter case, the executable is launched via the 
desktop environments 'launcher' process and will inherit the environment 
provided by that. Presumably, as Linux desktops mandate various things 
(like language settings), the locale and environment vars will be set 
appropriately.


An experiment, which make me wonder if this counts as a configuration 
problem or an actual bug in LC Server:


 In Terminal I type (actually paste) and execute

 echo “éü” > Carré.txt

(for Forum users like me who just see ? everywhere, that is 
[e-acute][u-umlaut][happyface emoji] in the content to be written to a 
file with [e-acute] in its name)


   This works without problem. The contents of the file are utf-8 
encoded, which I didn’t
need to specify, but I guess that is what the pasteboard provided. 
Terminal had no problem
creating or finding the file without needing those env settings. Of 
course it cannot *display*
the file name without knowing the encoding, so ls reports the filename 
as 'Carr'$'\303\251''.txt’
( readable as an ascii encoding, though not one I have seen before; 
note the single quotes)


I'm guessing here that this is a remote ssh session to your Linux 
server, and you are using macOS Terminal app to run and connect? If that 
is the case then the reason this works is because Terminal on macOS is 
UTF-8 (which is the *only* encoding macOS supports in its UNIX subsystem 
so you don't get the variance problem you do with Linux). This means 
that pasting text from somewhere else will paste the UTF-8 bytes - i.e. 
they will get transmitted over SSH to the remote linux machine.


As filenames are just sequences of bytes on Linux this works fine - 
however when you ask the remote terminal to list the files, it can only 
interpret the ascii chars (as the LANG is C) and thus emits octal 
escapes for the others - here this ix 0xC3 0xA9 which is the utf-8 
encoding of e-acute.



If I setup the env variables Mark suggests in the Terminal session

export LC_ALL="en_US.UTF8"
export LANG=“en_US.UTF8”

then Terminal is able to display the filename á la française.


So now the remote terminal knows how to interpret the sequences of bytes 
present in the filenames, and thus can emit them appropriately.


Cyberduck reports this filename correctly using the [e-acute] without 
having to set encoding
knowledge. And I can also create the file using Cyberduck with no 
problems. So IT knows about/expects/sets
up the encoding as needed. I bet other Linux-aware apps would also open 
or list such files without

drama or special configuration.


IT doesn't know - it assumes. I suspect that if you used Cyberduck to 
connect to a Linux server which is setup to *not* be utf-8 (so filenames 
are encoded with some other encoding), then it would display things 
incorrectly.


Of course, if the protocols it deals with specify the text encoding as 
utf-8 *and* the daemons running on said server are setup correctly (i.e. 
so that they process the filenames and such relative to the server's 
encoding) *and* they correctly convert the filenames from that encoding 
to the encoding mandated by the protocol then it would display fine.


Certainly FTP treats filenames as sequences of bytes - so at least for 
that protocol the client would have to assume UTF-8 or be told the 
correct encoding to do the correct thing.


However: in LC Server when I call "the long files" for the enclosing 
folder: crash!
(Actually an in-line error reported for this code line). To my mind 
that qualifies as

bug, even if the source of the crash is the same as for open file.


I take it by crash you mean a runtime error is logged, and that this 
only happens if the LANG / LC_ALL environment variables are not set?


This is the same issue as opening a file - the low-level text encoding 
from ASCII to the internal encoding used by strings in the engine will 
be failing because it encounters non-ASCII.


   On the other hand hopefully setting the environment variables as 
Mark suggests will
fix everything . Mark, could I clarify exactly how that “launcher 
script” is to be used…
I’m guessing the cgi configuration should point to that file to be 
executed when it wants
to open myscript.lc instead of pointing to the livecode-server 

Re: Linux filenames in LC Server

2023-08-15 Thread Neville Smythe via use-livecode
Thanks Mark for semi-unfuddling me. It’s good to know that textEncode/Decode is 
not to blame.

But if I may try everyones' patience a little further

> In the case of Linux what encoding such 'sys strings' need to use 
> depends on the environment - the encoding *could* be anything and thus 
> the engine uses the UNIX 'iconv' library to convert from internal 
> representation to the encoded bytes needed. I think this is what is 
> causing the failure of the file APIs - iconv is refusing to convert a 
> string with non-ascii characters to the 'default' 'C' locale as it can't 
> (there is no mapping from, say, e-acute to ascii).

So I misunderstood, I thought we were talking about Apache environment 
variables. Indeed the Terminal app reports

LANG=C

as a system env variable. But if this is not specifically a server problem, 
wouldn’t that mean we could see the same behaviour with LC Desktop on Linux 
machines running vanilla Ubuntu or Debian (which is what Dreamhost uses)? I 
haven’t tried this yet, as it is a bit of pain to fire up my Linux emulator 
machine.

An experiment, which make me wonder if this counts as a configuration problem 
or an actual bug in LC Server:

 In Terminal I type (actually paste) and execute

 echo “éü” > Carré.txt 

(for Forum users like me who just see ? everywhere, that is 
[e-acute][u-umlaut][happyface emoji] in the content to be written to a file 
with [e-acute] in its name)

   This works without problem. The contents of the file are utf-8 encoded, 
which I didn’t need to specify, but I guess that is what the pasteboard 
provided. Terminal had no problem creating or finding the file without needing 
those env settings. Of course it cannot *display* the file name without knowing 
the encoding, so ls reports the filename as 'Carr'$'\303\251''.txt’ ( readable 
as an ascii encoding, though not one I have seen before; note the single quotes)

 If I setup the env variables Mark suggests in the Terminal session

export LC_ALL="en_US.UTF8"
export LANG=“en_US.UTF8”

then Terminal is able to display the filename á la française.

  Cyberduck reports this filename correctly using the [e-acute] without 
having to set encoding knowledge. And I can also create the file using 
Cyberduck with no problems. So IT knows about/expects/sets up the encoding as 
needed. I bet other Linux-aware apps would also open or list such files without 
drama or special configuration.

   However: in LC Server when I call "the long files" for the enclosing folder: 
crash! (Actually an in-line error reported for this code line). To my mind that 
qualifies as a bug, even if the source of the crash is the same as for open 
file.

   On the other hand hopefully setting the environment variables as Mark 
suggests will fix everything . Mark, could I clarify exactly how that “launcher 
script” is to be used… I’m guessing the cgi configuration should point to that 
file to be executed when it wants to open myscript.lc instead of pointing to 
the livecode-server executable (in which case it might have to have a .cgi 
suffix rather than .txt), or is it a shell script to be executed by 
livecode-server?

Neville Smythe




___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Linux filenames in LC Server

2023-08-15 Thread Mark Waddingham via use-livecode

On 2023-08-15 08:42, Neville Smythe via use-livecode wrote:
So if I understand Mark correctly, while one can create utf-8 encoded 
filenames directly in a terminal
session,  LC Server internally accesses Apache environment variables to 
encode/decode the filename
before opening a file rather than directly using the shell. Presumably 
this has something to do with

the engine being a server app having to respect the server environment.


So what is actually happening here is that there is a notion of a 
'SysString' in the engine. A 'SysString' is a string represented as a 
sequence of bytes in whatever encoding the host platform understands in 
its APIs. The engine converts its internal string representation to a 
sys string whenever it accesses a system API - e.g. for opening files.


In the case of Linux what encoding such 'sys strings' need to use 
depends on the environment - the encoding *could* be anything and thus 
the engine uses the UNIX 'iconv' library to convert from internal 
representation to the encoded bytes needed. I think this is what is 
causing the failure of the file APIs - iconv is refusing to convert a 
string with non-ascii characters to the 'default' 'C' locale as it can't 
(there is no mapping from, say, e-acute to ascii).


I should point out that textEncode/Decode do not use system APIs - the 
conversions between UTF* forms and 'native' are all built into the 
engine - so that part is fine - its the low-level connection between 
commands like 'open file' and calling the UNIX open API which is 
throwing an error on file name conversion.


On Dreamhost, as far as I can determine, the LANG and LC-ALL variables 
are *not* set (though WordPress
is running and it adds support for a swathe of languages, so surely has 
support for non-ascii filenames?)
The site is a shared hosting, so I do not have permissions to change 
the Apache conf files. I tried adding
the SetEnv commands in the .htaccess file but that didn’t work, 
although I could well be doing it wrong,

I am fumbling around in the dark here.


The only thing I've found so far is SetEnv which does look like it can 
only be configured in the host config for a domain which is slightly 
irksome. However, there is a way to launch the CGI engine with any vars 
needed.


I'm not sure how Dreamhost sets things up - indeed it might be worth 
asking their support if there is a way to configure environment 
variables which are passed through to CGI executables.


If there isn't then it can be done with a launcher script:

```
#!/bin/sh
export LC_ALL="en_US.UTF8"
export LANG="en_US.UTF8"
exec livecode-server
```

This would be a text file which has been made executable - and needs to 
be configured as the executable which is launched when a livecode server 
script is launched (livecode-server in the above needs to be the 
location of the livecode-server executable in the hosting setup).


I know others here use (or have used) Dreamhost in the past - so they 
might know more about how the above could be configured (although, 
again, Dreamhost support can probably help).



Unless there is some way to fix the configuration, it would seem that 
not only will opening files
fail but the detailed files (the long files) command will also fail if 
non-ascii characters are
encountered since it uses textEncode. I presume that using shell 
commands could be used as a workaround
for accessing the filesystem, as long as LC doesn’t do an internal 
textEncode as it passes the

variables to the shell!
However it also means one cannot use textDecode/Encode at all, not just 
for the filenames but also
content; and that could be a bummer. I haven’t encountered this so far 
because to this point I have
encoded content before uploading binary files to the server, but I can 
envision situations where I

would want to encode or decode server-side.


The problem isn't with textEncode/Decode - they work fine as mentioned 
above - its just the engine doesn't have the necessary information (due 
to lack of env vars) to know how to interpret/create the filenames the 
system APIs need.


I’m puzzled that this problem hasn’t been raised before. Surely the 
vast majority of website host
providers use Linux servers, and the Dreamhost configuration for shared 
hosting is most likely
standard. So has no-one in Europe (or Asia..) using LC Server wanted to 
create native-language
filenames? I think LC Server is a magnificent tool, but perhaps it is 
not as widely used as it

deserves! Or: they all found the fix and haven’t told us.


This is almost certainly a server setup/config thing - I guess apache 
(by default) runs CGIs in the most 'raw' environment possible by 
default.


The observation about Wordpress is interesting - certainly before PHP 
was 'unicodified' - the encoding of filenames was up to the script - 
i.e. you had to to encode/decode filenames appropriately yourself and I 
guess utf-8 was just assumed. With PHP7 I believe it handles unicode 
transparently a 

Re: Linux filenames in LC Server

2023-08-15 Thread matthias rebbe via use-livecode
What definitely works, at least here,  is to urlencode the filename before 
creating it
So that e.g. testä would be created as test%E4
As urlencode does not "harm" you could use it in general, not only for 
non-ascii file names. 
And if you want to display the "real" name you just have to urldecode the 
filename again.





> Am 15.08.2023 um 09:42 schrieb Neville Smythe via use-livecode 
> :
> 
> Thanks Mark and Matthias
> 
> I think it is clear the problem is not related to variant forms - if I 
> replace [e-acute] by any other non-ascii character, such as a Kanji character 
> or emoji, I get the same “can’t open that file” error. And the weird decoding 
> of [e-acute] to [E-grave] would be explained if textDecode is failing in LC 
> Server.
> 
> So if I understand Mark correctly, while one can create utf-8 encoded 
> filenames directly in a terminal session,  LC Server internally accesses 
> Apache environment variables to encode/decode the filename before opening a 
> file rather than directly using the shell. Presumably this has something to 
> do with the engine being a server app having to respect the server 
> environment.  
> 
> On Dreamhost, as far as I can determine, the LANG and LC-ALL variables are 
> *not* set (though WordPress is running and it adds support for a swathe of 
> languages, so surely has support for non-ascii filenames?) The site is a 
> shared hosting, so I do not have permissions to change the Apache conf files. 
> I tried adding the SetEnv commands in the .htaccess file but that didn’t 
> work, although I could well be doing it wrong, I am fumbling around in the 
> dark here.
> 
> Unless there is some way to fix the configuration, it would seem that not 
> only will opening files fail but the detailed files (the long files) command 
> will also fail if non-ascii characters are encountered since it uses 
> textEncode. I presume that using shell commands could be used as a workaround 
> for accessing the filesystem, as long as LC doesn’t do an internal textEncode 
> as it passes the variables to the shell! 
> 
> However it also means one cannot use textDecode/Encode at all, not just for 
> the filenames but also content; and that could be a bummer. I haven’t 
> encountered this so far because to this point I have encoded content before 
> uploading binary files to the server, but I can envision situations where I 
> would want to encode or decode server-side.
> 
> I’m puzzled that this problem hasn’t been raised before. Surely the vast 
> majority of website host providers use Linux servers, and the Dreamhost 
> configuration for shared hosting is most likely standard. So has no-one in 
> Europe (or Asia..) using LC Server wanted to create native-language 
> filenames? I think LC Server is a magnificent tool, but perhaps it is not as 
> widely used as it deserves! Or: they all found the fix and haven’t told us.
> 
>> So, when you run lc-server from a terminal session directly, its almost 
>> certainly the case that the LC_ALL and LANG environment variables are 
>> set to en_US.UTF-8 (or some other language code DOT UTF-8 - it is the 
>> UTF-8 which is the important bit).
>> 
>> On Linux, a C API nl_langinfo() is used to fetch the encoding to use 
>> when talking to the system APIs (e.g. filesystem APIs) - this (I 
>> believe) derives its information from LANG/LC_ALL.
>> 
>> If the latter *are not set* then it will likely default to the 'C' 
>> locale which has no interpretation of any non-ascii chars, and thus 
>> attempts to encode/decode utf-8 encoded filenames will fail.
>> 
>> My theory is that these variables are not set in the configuration for 
>> running CGIs in Apache (or whatever web server is being used in this 
>> instance).
>> 
>> Digging around it looks like Apache (at least) has a `SetEnv` directive 
>> which would allow these environment variables to be set, e.g.
>> 
>>  SetEnv LC_ALL en_US.UTF-8
>>  SetEnv LANG en_US.UTF-8
>> 
>> Although I'm not 100% sure where such things go, perhaps someone more 
>> conversant with apache config could chime in to suggest.
> Neville Smythe
> 
> 
> 
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Linux filenames in LC Server

2023-08-15 Thread Neville Smythe via use-livecode
Thanks Mark and Matthias

I think it is clear the problem is not related to variant forms - if I replace 
[e-acute] by any other non-ascii character, such as a Kanji character or emoji, 
I get the same “can’t open that file” error. And the weird decoding of 
[e-acute] to [E-grave] would be explained if textDecode is failing in LC Server.

So if I understand Mark correctly, while one can create utf-8 encoded filenames 
directly in a terminal session,  LC Server internally accesses Apache 
environment variables to encode/decode the filename before opening a file 
rather than directly using the shell. Presumably this has something to do with 
the engine being a server app having to respect the server environment.  

On Dreamhost, as far as I can determine, the LANG and LC-ALL variables are 
*not* set (though WordPress is running and it adds support for a swathe of 
languages, so surely has support for non-ascii filenames?) The site is a shared 
hosting, so I do not have permissions to change the Apache conf files. I tried 
adding the SetEnv commands in the .htaccess file but that didn’t work, although 
I could well be doing it wrong, I am fumbling around in the dark here.

Unless there is some way to fix the configuration, it would seem that not only 
will opening files fail but the detailed files (the long files) command will 
also fail if non-ascii characters are encountered since it uses textEncode. I 
presume that using shell commands could be used as a workaround for accessing 
the filesystem, as long as LC doesn’t do an internal textEncode as it passes 
the variables to the shell! 

However it also means one cannot use textDecode/Encode at all, not just for the 
filenames but also content; and that could be a bummer. I haven’t encountered 
this so far because to this point I have encoded content before uploading 
binary files to the server, but I can envision situations where I would want to 
encode or decode server-side.

I’m puzzled that this problem hasn’t been raised before. Surely the vast 
majority of website host providers use Linux servers, and the Dreamhost 
configuration for shared hosting is most likely standard. So has no-one in 
Europe (or Asia..) using LC Server wanted to create native-language filenames? 
I think LC Server is a magnificent tool, but perhaps it is not as widely used 
as it deserves! Or: they all found the fix and haven’t told us.

> So, when you run lc-server from a terminal session directly, its almost 
> certainly the case that the LC_ALL and LANG environment variables are 
> set to en_US.UTF-8 (or some other language code DOT UTF-8 - it is the 
> UTF-8 which is the important bit).
> 
> On Linux, a C API nl_langinfo() is used to fetch the encoding to use 
> when talking to the system APIs (e.g. filesystem APIs) - this (I 
> believe) derives its information from LANG/LC_ALL.
> 
> If the latter *are not set* then it will likely default to the 'C' 
> locale which has no interpretation of any non-ascii chars, and thus 
> attempts to encode/decode utf-8 encoded filenames will fail.
> 
> My theory is that these variables are not set in the configuration for 
> running CGIs in Apache (or whatever web server is being used in this 
> instance).
> 
> Digging around it looks like Apache (at least) has a `SetEnv` directive 
> which would allow these environment variables to be set, e.g.
> 
>   SetEnv LC_ALL en_US.UTF-8
>   SetEnv LANG en_US.UTF-8
> 
> Although I'm not 100% sure where such things go, perhaps someone more 
> conversant with apache config could chime in to suggest.
Neville Smythe




___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Linux filenames in LC Server

2023-08-14 Thread matthias rebbe via use-livecode
see below...


> Am 14.08.2023 um 13:30 schrieb Mark Waddingham via use-livecode 
> :
> 
> On 2023-08-14 12:12, matthias rebbe via use-livecode wrote:
>> Hi Mark,
>> when i read Neville's post i thought also about normalize, although i really 
>> do not have a clue about the whole unicode stuff, but i remembered that the 
>> standalone builder make use of the normalize function. ;)
>> So i used this script on LC Server to write the seconds to a file containing 
>> an a-umlaut in its name.
>> put  normalizeText("testä.txt", "NFC") into tFile
>> put the seconds into URL ("binfile:")
>> put the result
>> put ""
>> put the files
>> put ""
>> put tFile
>> But that does not work. "The result" returns 'can't open file'.
> 
> Hmmm - I must confess that I misread Neville's post - he did explicitly 
> mention 'creating' files... The normalization would only arise if the file 
> already existed, but the requested (incoming) filename was normalized 
> differently (thus resulting in the file not being found).
> 
> So assuming that the defaultFolder is accessible in your above script (as a 
> read-only folder would also cause the same error) then there does appear to 
> be something up here...
> 

The default folder is accessible. The same script works when the ä is removed  
from the line
put  normalizeText("testä.txt", "NFC") into tFile



> Warmest Regards,
> 
> Mark.
> 
> -- 
> Mark Waddingham ~ m...@livecode.com  ~ 
> http://www.livecode.com/
> LiveCode: Build Amazing Things
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com 
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Linux filenames in LC Server

2023-08-14 Thread Mark Waddingham via use-livecode

On 2023-08-14 12:30, Mark Waddingham via use-livecode wrote:
So assuming that the defaultFolder is accessible in your above script 
(as a read-only folder would also cause the same error) then there does 
appear to be something up here...


Okay so I'm pretty sure the linux server engine is doing the right 
thing.


As mentioned previously, Linux filesystems don't actually care what the 
encoding of a filename is - to linux its just a sequence of bytes


The interpretation is given by the 'locale' settings which are in effect 
for any given program.


So, when you run lc-server from a terminal session directly, its almost 
certainly the case that the LC_ALL and LANG environment variables are 
set to en_US.UTF-8 (or some other language code DOT UTF-8 - it is the 
UTF-8 which is the important bit).


On Linux, a C API nl_langinfo() is used to fetch the encoding to use 
when talking to the system APIs (e.g. filesystem APIs) - this (I 
believe) derives its information from LANG/LC_ALL.


If the latter *are not set* then it will likely default to the 'C' 
locale which has no interpretation of any non-ascii chars, and thus 
attempts to encode/decode utf-8 encoded filenames will fail.


My theory is that these variables are not set in the configuration for 
running CGIs in Apache (or whatever web server is being used in this 
instance).


Digging around it looks like Apache (at least) has a `SetEnv` directive 
which would allow these environment variables to be set, e.g.


  SetEnv LC_ALL en_US.UTF-8
  SetEnv LANG en_US.UTF-8

Although I'm not 100% sure where such things go, perhaps someone more 
conversant with apache config could chime in to suggest.


Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Build Amazing Things

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Linux filenames in LC Server

2023-08-14 Thread Mark Waddingham via use-livecode

On 2023-08-14 12:12, matthias rebbe via use-livecode wrote:

Hi Mark,

when i read Neville's post i thought also about normalize, although i 
really do not have a clue about the whole unicode stuff, but i 
remembered that the standalone builder make use of the normalize 
function. ;)


So i used this script on LC Server to write the seconds to a file 
containing an a-umlaut in its name.


put  normalizeText("testä.txt", "NFC") into tFile
put the seconds into URL ("binfile:")
put the result
put ""
put the files
put ""
put tFile

But that does not work. "The result" returns 'can't open file'.


Hmmm - I must confess that I misread Neville's post - he did explicitly 
mention 'creating' files... The normalization would only arise if the 
file already existed, but the requested (incoming) filename was 
normalized differently (thus resulting in the file not being found).


So assuming that the defaultFolder is accessible in your above script 
(as a read-only folder would also cause the same error) then there does 
appear to be something up here...


Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Build Amazing Things

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Linux filenames in LC Server

2023-08-14 Thread matthias rebbe via use-livecode
Hi Mark,

when i read Neville's post i thought also about normalize, although i really do 
not have a clue about the whole unicode stuff, but i remembered that the 
standalone builder make use of the normalize function. ;)

So i used this script on LC Server to write the seconds to a file containing an 
a-umlaut in its name.

put  normalizeText("testä.txt", "NFC") into tFile
put the seconds into URL ("binfile:")
put the result
put ""
put the files
put ""
put tFile

But that does not work. "The result" returns 'can't open file'. 
As i already wrote i have no clue about unicode so i tried also NFD and also 
the other 2 options, but also w/o success.

Is there something else that  one hast to keep in mind to have success with 
this?


Regards,
Matthias



> Am 14.08.2023 um 12:22 schrieb Mark Waddingham via use-livecode 
> :
> 
> On 2023-08-14 02:45, Neville Smythe via use-livecode wrote:
>> OK, so the macOS *is* using utf-8 for its file names - the [e-acute] in the 
>> filename Carré.txt is rendered with two bytes [C3A9] not the single byte 
>> MacRoman encoding. I got tricked by copying the terminal listing into 
>> another program rather than hex dumping within the terminal, and somewhere 
>> in the process the native encoding was preferred.
>> So one must *not* textEncode a filename to utf-8 before writing a file to 
>> disk, LC deals with the encoding, although you *should” textEncode its 
>> contents.
>> Which leaves the problem of why I can’t get LC Server on Linux to write 
>> non-ascii filenames
> 
> So I suspect the problem here is normalization, rather than the inability of 
> Linux to write non-ascii filenames.
> 
> Characters such as e-acute / e-grave have *two* representations in unicode - 
> the decomposed and composed form.
> 
> The composed form is a direct mapping from the native encodings and is a 
> single codepoint, the decomposed form will be two codepoints - (e, 
> combining-acute/grave)
> 
> Depending on where the string comes from it might either be composed or 
> decomposed - macOS filenames are stored decomposed in the FS, but the 
> higher-level parts of the OS make either form work (in a similar fashion to 
> how macOS filesystems are case-insensitive by default).
> 
> Linux filesystems, however, are both case-sensitive and form-sensitive - a 
> filename must match byte to byte with what it was created with (indeed, linux 
> filesystems care nothing for encodings, they see filenames as a sequence of 
> bytes which are interpreted relative to the user's current locale - the 
> default locale on linux these days is utf-8).
> 
> If your app is managing the files completely on Linux (i.e. it is creating / 
> deleting them and the filenames are not user-editable) then (if this is the 
> caseu) the problem should be fixable by choosing a normalization form when 
> you create / lookup the file - i.e. pass all filenames on the server through 
> `normalizeText(, )` - here you want form to be either "NFC" 
> (composed) or "NFD" (decomposed).
> 
> Warmest Regards,
> 
> Mark.
> 
> P.S. For all the gory details about Unicode normalization forms see - 
> https://unicode.org/reports/tr15/
> 
> -- 
> Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
> LiveCode: Build Amazing Things
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Linux filenames in LC Server

2023-08-14 Thread Mark Waddingham via use-livecode

On 2023-08-14 02:45, Neville Smythe via use-livecode wrote:
OK, so the macOS *is* using utf-8 for its file names - the [e-acute] in 
the filename Carré.txt is rendered with two bytes [C3A9] not the single 
byte MacRoman encoding. I got tricked by copying the terminal listing 
into another program rather than hex dumping within the terminal, and 
somewhere in the process the native encoding was preferred.


So one must *not* textEncode a filename to utf-8 before writing a file 
to disk, LC deals with the encoding, although you *should” textEncode 
its contents.


Which leaves the problem of why I can’t get LC Server on Linux to write 
non-ascii filenames


So I suspect the problem here is normalization, rather than the 
inability of Linux to write non-ascii filenames.


Characters such as e-acute / e-grave have *two* representations in 
unicode - the decomposed and composed form.


The composed form is a direct mapping from the native encodings and is a 
single codepoint, the decomposed form will be two codepoints - (e, 
combining-acute/grave)


Depending on where the string comes from it might either be composed or 
decomposed - macOS filenames are stored decomposed in the FS, but the 
higher-level parts of the OS make either form work (in a similar fashion 
to how macOS filesystems are case-insensitive by default).


Linux filesystems, however, are both case-sensitive and form-sensitive - 
a filename must match byte to byte with what it was created with 
(indeed, linux filesystems care nothing for encodings, they see 
filenames as a sequence of bytes which are interpreted relative to the 
user's current locale - the default locale on linux these days is 
utf-8).


If your app is managing the files completely on Linux (i.e. it is 
creating / deleting them and the filenames are not user-editable) then 
(if this is the caseu) the problem should be fixable by choosing a 
normalization form when you create / lookup the file - i.e. pass all 
filenames on the server through `normalizeText(, )` - here 
you want form to be either "NFC" (composed) or "NFD" (decomposed).


Warmest Regards,

Mark.

P.S. For all the gory details about Unicode normalization forms see - 
https://unicode.org/reports/tr15/


--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Build Amazing Things

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Linux filenames in LC Server

2023-08-13 Thread Neville Smythe via use-livecode
OK, so the macOS *is* using utf-8 for its file names - the [e-acute] in the 
filename Carré.txt is rendered with two bytes [C3A9] not the single byte 
MacRoman encoding. I got tricked by copying the terminal listing into another 
program rather than hex dumping within the terminal, and somewhere in the 
process the native encoding was preferred. 

So one must *not* textEncode a filename to utf-8 before writing a file to disk, 
LC deals with the encoding, although you *should” textEncode its contents.

Which leaves the problem of why I can’t get LC Server on Linux to write 
non-ascii filenames

Neville Smythe




___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode