Re: RC1: Unable to read files with accents in names on Windows

2024-01-25 Thread Enrico Forestieri

On Thu, Jan 25, 2024 at 12:40:07PM -0500, Richard Kimberly Heck wrote:


That does make me wonder whether there are other versions of this 
problem. Are there other cases where we generate files in this way?


I think we will discover that very quickly ;)

--
Enrico
--
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: RC1: Unable to read files with accents in names on Windows

2024-01-25 Thread Richard Kimberly Heck

On 1/25/24 04:29, Enrico Forestieri wrote:

On Thu, Jan 25, 2024 at 09:31:00AM +0100, Enrico Forestieri wrote:

Converting the file to utf-8 encoding everything works fine.

Fixed at 48a065e8


Thanks!

Riki


--
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: RC1: Unable to read files with accents in names on Windows

2024-01-25 Thread Richard Kimberly Heck

On 1/25/24 04:35, José Matos wrote:

On Thu, 2024-01-25 at 09:31 +0100, Enrico Forestieri wrote:

After investigating this I now know why. Capturing the generated
script in a file reveals that it is actually encoded in a 8 bit
encoding on Windows, despite the fact that the first line of the
script says it is encoded in utf-8.

Basically it comes to the difference between bytes and string. In
Python 2 they are the same.

In Python 3 the line that states that the content is utf8 is a no-op
since all code files need to be in that encoding.

https://docs.python.org/3/howto/unicode.html#the-string-type


That does make me wonder whether there are other versions of this 
problem. Are there other cases where we generate files in this way?


Riki


--
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: RC1: Unable to read files with accents in names on Windows

2024-01-25 Thread José Matos
On Thu, 2024-01-25 at 10:29 +0100, Enrico Forestieri wrote:
> On Thu, Jan 25, 2024 at 09:31:00AM +0100, Enrico Forestieri wrote:
> > 
> > Converting the file to utf-8 encoding everything works fine.
> 
> Fixed at 48a065e8
> 
> -- 
> Enrico

Thank you. :-)
-- 
José Abílio
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: RC1: Unable to read files with accents in names on Windows

2024-01-25 Thread José Matos
On Thu, 2024-01-25 at 09:31 +0100, Enrico Forestieri wrote:
> After investigating this I now know why. Capturing the generated
> script in a file reveals that it is actually encoded in a 8 bit
> encoding on Windows, despite the fact that the first line of the
> script says it is encoded in utf-8.

Basically it comes to the difference between bytes and string. In
Python 2 they are the same.

In Python 3 the line that states that the content is utf8 is a no-op
since all code files need to be in that encoding.

https://docs.python.org/3/howto/unicode.html#the-string-type


-- 
José Abílio
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: RC1: Unable to read files with accents in names on Windows

2024-01-25 Thread Enrico Forestieri

On Thu, Jan 25, 2024 at 09:31:00AM +0100, Enrico Forestieri wrote:


Converting the file to utf-8 encoding everything works fine.


Fixed at 48a065e8

--
Enrico
--
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: RC1: Unable to read files with accents in names on Windows

2024-01-25 Thread Enrico Forestieri

On Tue, Jan 23, 2024 at 09:15:30PM +0100, Enrico Forestieri wrote:


On Tue, Jan 23, 2024 at 02:58:37PM -0500, Richard Kimberly Heck wrote:


On 1/23/24 14:51, Enrico Forestieri wrote:

On Mon, Jan 22, 2024 at 06:21:42PM -0500, Richard Kimberly Heck wrote:


The conversion script has:

infile = "C:/Users/Thibaut/Desktop/p k .pdf"

when the input file was: ~\Desktop\píkà.pdf. The accented 
characters have been stripped. This is not a surprise, though, 
since toFilesystemEncoding has a comment that says it does not 
work with non-ASCII characters on Windows. The puzzle is why 
this worked on 2.3.7. None of this code has changed, so far as I 
can see.


Note that this might be a red herring. toFilesystemEncoding only 
encodes file names and not the content of a file that is utf8. The 
fact that the accented characters seem to have disappeared may 
simply be due to the fact the terminal on Windows is not able to 
display utf8 characters.


So, the output should be redirected to a file to be sure.


Can you explain to Didier how to do that on Windows?


This may be impossible because the distributed version of lyx is a gui 
application and prints nothing to the terminal.


Thibaut said he could reproduce. I'm reattaching his test files and 
logs. The logs were copied from the Messages pane, he said.


Yes, I get the same mangled name in the message pane:

infile = "C:/work/test/p�k�.pdf"

but everything works fine for me. Seemingly, the problem with the 
message pane is peculiar to the Windows native version because on 
cygwin I instead correctly get:


infile = "/c/work/test/píkà.pdf"

However, I don't know why that is so.


After investigating this I now know why. Capturing the generated script 
in a file reveals that it is actually encoded in a 8 bit encoding on 
Windows, despite the fact that the first line of the script says it is 
encoded in utf-8.


It has always been like that, so why it does not work anymore? The 
answer is that the script works with Python2 but not with Python3.

Trying to run it with Python2 everything works, whereas Python3 gives:
$ /c/Progra~1/LyX/Python/python.exe conv.py
  File "C:\work\test\conv.py", line 11
infile = "C:/work/test/p�k�.pdf"
^
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xed in 
position 14: invalid continuation byte


Converting the file to utf-8 encoding everything works fine.

--
Enrico
--
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: RC1: Unable to read files with accents in names on Windows

2024-01-23 Thread Enrico Forestieri

On Tue, Jan 23, 2024 at 02:58:37PM -0500, Richard Kimberly Heck wrote:


On 1/23/24 14:51, Enrico Forestieri wrote:

On Mon, Jan 22, 2024 at 06:21:42PM -0500, Richard Kimberly Heck wrote:


The conversion script has:

infile = "C:/Users/Thibaut/Desktop/p k .pdf"

when the input file was: ~\Desktop\píkà.pdf. The accented 
characters have been stripped. This is not a surprise, though, 
since toFilesystemEncoding has a comment that says it does not 
work with non-ASCII characters on Windows. The puzzle is why this 
worked on 2.3.7. None of this code has changed, so far as I can 
see.


Note that this might be a red herring. toFilesystemEncoding only 
encodes file names and not the content of a file that is utf8. The 
fact that the accented characters seem to have disappeared may 
simply be due to the fact the terminal on Windows is not able to 
display utf8 characters.


So, the output should be redirected to a file to be sure.


Can you explain to Didier how to do that on Windows?


This may be impossible because the distributed version of lyx is a gui 
application and prints nothing to the terminal.


Thibaut said he could reproduce. I'm reattaching his test files and 
logs. The logs were copied from the Messages pane, he said.


Yes, I get the same mangled name in the message pane:

infile = "C:/work/test/p�k�.pdf"

but everything works fine for me. Seemingly, the problem with the 
message pane is peculiar to the Windows native version because on cygwin 
I instead correctly get:


infile = "/c/work/test/píkà.pdf"

However, I don't know why that is so.

--
Enrico
--
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: RC1: Unable to read files with accents in names on Windows

2024-01-23 Thread Enrico Forestieri

On Mon, Jan 22, 2024 at 06:21:42PM -0500, Richard Kimberly Heck wrote:


The conversion script has:

infile = "C:/Users/Thibaut/Desktop/p k .pdf"

when the input file was: ~\Desktop\píkà.pdf. The accented characters 
have been stripped. This is not a surprise, though, since 
toFilesystemEncoding has a comment that says it does not work with 
non-ASCII characters on Windows. The puzzle is why this worked on 
2.3.7. None of this code has changed, so far as I can see.


Note that this might be a red herring. toFilesystemEncoding only encodes 
file names and not the content of a file that is utf8. The fact that the 
accented characters seem to have disappeared may simply be due to the 
fact the terminal on Windows is not able to display utf8 characters.


So, the output should be redirected to a file to be sure.

--
Enrico
--
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: RC1: Unable to read files with accents in names on Windows

2024-01-23 Thread Enrico Forestieri

On Mon, Jan 22, 2024 at 06:21:42PM -0500, Richard Kimberly Heck wrote:


On 1/22/24 17:53, Thibaut Cuvelier wrote:
On Mon, 22 Jan 2024 at 23:00, Richard Kimberly Heck 
 wrote:


   On 1/22/24 16:53, didiergab...@free.fr wrote:

   I also realize that I can no longer load images whose names
   contain accents. In any case, that’s what’s happening with the
   file I just sent you. If I rename the file: SchemaCinematique.pdf
   to SchemaCinématique.pdf then I can read “Error converting to a
   readable format.”


   That's a serious bug. Can anyone on Windows check this?

I can reproduce with PDF files whose names have accents, but not PNG 
(with the same file name apart from the extension). If I export the 
file to LyX 2.3 and load it with LyX 2.3.7, the PDF file doesn't 
have any issue (with MikTeX, up to date).


I'm attaching the logs (View > Messages Pane, with all logs enabled) 
and the corresponding test files (LyX 2.3 and 2.4).


The conversion script has:

infile = "C:/Users/Thibaut/Desktop/p k .pdf"

when the input file was: ~\Desktop\píkà.pdf. The accented characters 
have been stripped. This is not a surprise, though, since 
toFilesystemEncoding has a comment that says it does not work with 
non-ASCII characters on Windows. The puzzle is why this worked on 
2.3.7. None of this code has changed, so far as I can see.


Enrico, do you know if there is a reason not to use 
toSafeFilesystemEncoding here instead? This is in 
GraphicsConverter.cpp, line 139.


I am not able to reproduce the problem. We should try to understand what 
is the real issue before changing the source.


--
Enrico
--
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: RC1: Unable to read files with accents in names on Windows

2024-01-22 Thread Richard Kimberly Heck

On 1/22/24 17:53, Thibaut Cuvelier wrote:
On Mon, 22 Jan 2024 at 23:00, Richard Kimberly Heck 
 wrote:


On 1/22/24 16:53, didiergab...@free.fr wrote:

I also realize that I can no longer load images whose names
contain accents. In any case, that’s what’s happening with the
file I just sent you. If I rename the file: SchemaCinematique.pdf
to SchemaCinématique.pdf then I can read “Error converting to a
readable format.”


That's a serious bug. Can anyone on Windows check this?

I can reproduce with PDF files whose names have accents, but not PNG 
(with the same file name apart from the extension). If I export the 
file to LyX 2.3 and load it with LyX 2.3.7, the PDF file doesn't have 
any issue (with MikTeX, up to date).


I'm attaching the logs (View > Messages Pane, with all logs enabled) 
and the corresponding test files (LyX 2.3 and 2.4).


The conversion script has:

infile = "C:/Users/Thibaut/Desktop/p k .pdf"

when the input file was: ~\Desktop\píkà.pdf. The accented characters 
have been stripped. This is not a surprise, though, since 
toFilesystemEncoding has a comment that says it does not work with 
non-ASCII characters on Windows. The puzzle is why this worked on 2.3.7. 
None of this code has changed, so far as I can see.


Enrico, do you know if there is a reason not to use 
toSafeFilesystemEncoding here instead? This is in GraphicsConverter.cpp, 
line 139.


Riki

-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: RC1: Unable to read files with accents in names on Windows

2024-01-22 Thread Thibaut Cuvelier
On Mon, 22 Jan 2024 at 23:53, Thibaut Cuvelier  wrote:

> On Mon, 22 Jan 2024 at 23:00, Richard Kimberly Heck 
> wrote:
>
>> On 1/22/24 16:53, didiergab...@free.fr wrote:
>>
>> I also realize that I can no longer load images whose names contain accents.
>> In any case, that’s what’s happening with the file I just sent you. If I 
>> rename the file:
>> SchemaCinematique.pdf to SchemaCinématique.pdf
>> then I can read “Error converting to a readable format.”
>>
>> That's a serious bug. Can anyone on Windows check this?
>>
> I can reproduce with PDF files whose names have accents, but not PNG (with
> the same file name apart from the extension). If I export the file to LyX
> 2.3 and load it with LyX 2.3.7, the PDF file doesn't have any issue (with
> MikTeX, up to date).
>
> I'm attaching the logs (View > Messages Pane, with all logs enabled) and
> the corresponding test files (LyX 2.3 and 2.4).
>

Another data point: in the temporary folder LyX uses for this document
(lyx_tmpdir.bsUGvNMjILjF), I have six files, all of them empty (size: zero
byte).

PS C:\Users\Thibaut\AppData\Local\Temp\lyx_tmpdir.bsUGvNMjILjF> ls -R


Directory: C:\Users\Thibaut\AppData\Local\Temp\lyx_tmpdir.bsUGvNMjILjF


Mode LastWriteTime Length Name
 - -- 
d- 22-Jan-24 23:47 lyx_tmpbuf0
-a 22-Jan-24 23:43 0 CacheItem.EtXrBH
-a 22-Jan-24 23:44 0 CacheItem.nbPvek
-a 22-Jan-24 23:42 0 CacheItem.OYeYzR
-a 22-Jan-24 23:42 0 gconvertAvdsxC.pdf
-a 22-Jan-24 23:43 0 gconvertFgslXV.pdf
-a 22-Jan-24 23:44 0 gconvertIdshbV.pdf


Directory:
C:\Users\Thibaut\AppData\Local\Temp\lyx_tmpdir.bsUGvNMjILjF\lyx_tmpbuf0


Mode LastWriteTime Length Name
 - -- 
-a 22-Jan-24 23:47 2598 test.23.lyx
-a 22-Jan-24 23:47 2938 test.lyx
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel