Re: RC1: Unable to read files with accents in names on Windows
On Thu, Jan 25, 2024 at 12:40:07PM -0500, Richard Kimberly Heck wrote: That does make me wonder whether there are other versions of this problem. Are there other cases where we generate files in this way? I think we will discover that very quickly ;) -- Enrico -- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel
Re: RC1: Unable to read files with accents in names on Windows
On 1/25/24 04:29, Enrico Forestieri wrote: On Thu, Jan 25, 2024 at 09:31:00AM +0100, Enrico Forestieri wrote: Converting the file to utf-8 encoding everything works fine. Fixed at 48a065e8 Thanks! Riki -- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel
Re: RC1: Unable to read files with accents in names on Windows
On 1/25/24 04:35, José Matos wrote: On Thu, 2024-01-25 at 09:31 +0100, Enrico Forestieri wrote: After investigating this I now know why. Capturing the generated script in a file reveals that it is actually encoded in a 8 bit encoding on Windows, despite the fact that the first line of the script says it is encoded in utf-8. Basically it comes to the difference between bytes and string. In Python 2 they are the same. In Python 3 the line that states that the content is utf8 is a no-op since all code files need to be in that encoding. https://docs.python.org/3/howto/unicode.html#the-string-type That does make me wonder whether there are other versions of this problem. Are there other cases where we generate files in this way? Riki -- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel
Re: RC1: Unable to read files with accents in names on Windows
On Thu, 2024-01-25 at 10:29 +0100, Enrico Forestieri wrote: > On Thu, Jan 25, 2024 at 09:31:00AM +0100, Enrico Forestieri wrote: > > > > Converting the file to utf-8 encoding everything works fine. > > Fixed at 48a065e8 > > -- > Enrico Thank you. :-) -- José Abílio -- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel
Re: RC1: Unable to read files with accents in names on Windows
On Thu, 2024-01-25 at 09:31 +0100, Enrico Forestieri wrote: > After investigating this I now know why. Capturing the generated > script in a file reveals that it is actually encoded in a 8 bit > encoding on Windows, despite the fact that the first line of the > script says it is encoded in utf-8. Basically it comes to the difference between bytes and string. In Python 2 they are the same. In Python 3 the line that states that the content is utf8 is a no-op since all code files need to be in that encoding. https://docs.python.org/3/howto/unicode.html#the-string-type -- José Abílio -- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel
Re: RC1: Unable to read files with accents in names on Windows
On Thu, Jan 25, 2024 at 09:31:00AM +0100, Enrico Forestieri wrote: Converting the file to utf-8 encoding everything works fine. Fixed at 48a065e8 -- Enrico -- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel
Re: RC1: Unable to read files with accents in names on Windows
On Tue, Jan 23, 2024 at 09:15:30PM +0100, Enrico Forestieri wrote: On Tue, Jan 23, 2024 at 02:58:37PM -0500, Richard Kimberly Heck wrote: On 1/23/24 14:51, Enrico Forestieri wrote: On Mon, Jan 22, 2024 at 06:21:42PM -0500, Richard Kimberly Heck wrote: The conversion script has: infile = "C:/Users/Thibaut/Desktop/p k .pdf" when the input file was: ~\Desktop\píkà.pdf. The accented characters have been stripped. This is not a surprise, though, since toFilesystemEncoding has a comment that says it does not work with non-ASCII characters on Windows. The puzzle is why this worked on 2.3.7. None of this code has changed, so far as I can see. Note that this might be a red herring. toFilesystemEncoding only encodes file names and not the content of a file that is utf8. The fact that the accented characters seem to have disappeared may simply be due to the fact the terminal on Windows is not able to display utf8 characters. So, the output should be redirected to a file to be sure. Can you explain to Didier how to do that on Windows? This may be impossible because the distributed version of lyx is a gui application and prints nothing to the terminal. Thibaut said he could reproduce. I'm reattaching his test files and logs. The logs were copied from the Messages pane, he said. Yes, I get the same mangled name in the message pane: infile = "C:/work/test/p�k�.pdf" but everything works fine for me. Seemingly, the problem with the message pane is peculiar to the Windows native version because on cygwin I instead correctly get: infile = "/c/work/test/píkà.pdf" However, I don't know why that is so. After investigating this I now know why. Capturing the generated script in a file reveals that it is actually encoded in a 8 bit encoding on Windows, despite the fact that the first line of the script says it is encoded in utf-8. It has always been like that, so why it does not work anymore? The answer is that the script works with Python2 but not with Python3. Trying to run it with Python2 everything works, whereas Python3 gives: $ /c/Progra~1/LyX/Python/python.exe conv.py File "C:\work\test\conv.py", line 11 infile = "C:/work/test/p�k�.pdf" ^ SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xed in position 14: invalid continuation byte Converting the file to utf-8 encoding everything works fine. -- Enrico -- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel
Re: RC1: Unable to read files with accents in names on Windows
On Tue, Jan 23, 2024 at 02:58:37PM -0500, Richard Kimberly Heck wrote: On 1/23/24 14:51, Enrico Forestieri wrote: On Mon, Jan 22, 2024 at 06:21:42PM -0500, Richard Kimberly Heck wrote: The conversion script has: infile = "C:/Users/Thibaut/Desktop/p k .pdf" when the input file was: ~\Desktop\píkà.pdf. The accented characters have been stripped. This is not a surprise, though, since toFilesystemEncoding has a comment that says it does not work with non-ASCII characters on Windows. The puzzle is why this worked on 2.3.7. None of this code has changed, so far as I can see. Note that this might be a red herring. toFilesystemEncoding only encodes file names and not the content of a file that is utf8. The fact that the accented characters seem to have disappeared may simply be due to the fact the terminal on Windows is not able to display utf8 characters. So, the output should be redirected to a file to be sure. Can you explain to Didier how to do that on Windows? This may be impossible because the distributed version of lyx is a gui application and prints nothing to the terminal. Thibaut said he could reproduce. I'm reattaching his test files and logs. The logs were copied from the Messages pane, he said. Yes, I get the same mangled name in the message pane: infile = "C:/work/test/p�k�.pdf" but everything works fine for me. Seemingly, the problem with the message pane is peculiar to the Windows native version because on cygwin I instead correctly get: infile = "/c/work/test/píkà.pdf" However, I don't know why that is so. -- Enrico -- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel
Re: RC1: Unable to read files with accents in names on Windows
On Mon, Jan 22, 2024 at 06:21:42PM -0500, Richard Kimberly Heck wrote: The conversion script has: infile = "C:/Users/Thibaut/Desktop/p k .pdf" when the input file was: ~\Desktop\píkà.pdf. The accented characters have been stripped. This is not a surprise, though, since toFilesystemEncoding has a comment that says it does not work with non-ASCII characters on Windows. The puzzle is why this worked on 2.3.7. None of this code has changed, so far as I can see. Note that this might be a red herring. toFilesystemEncoding only encodes file names and not the content of a file that is utf8. The fact that the accented characters seem to have disappeared may simply be due to the fact the terminal on Windows is not able to display utf8 characters. So, the output should be redirected to a file to be sure. -- Enrico -- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel
Re: RC1: Unable to read files with accents in names on Windows
On Mon, Jan 22, 2024 at 06:21:42PM -0500, Richard Kimberly Heck wrote: On 1/22/24 17:53, Thibaut Cuvelier wrote: On Mon, 22 Jan 2024 at 23:00, Richard Kimberly Heck wrote: On 1/22/24 16:53, didiergab...@free.fr wrote: I also realize that I can no longer load images whose names contain accents. In any case, that’s what’s happening with the file I just sent you. If I rename the file: SchemaCinematique.pdf to SchemaCinématique.pdf then I can read “Error converting to a readable format.” That's a serious bug. Can anyone on Windows check this? I can reproduce with PDF files whose names have accents, but not PNG (with the same file name apart from the extension). If I export the file to LyX 2.3 and load it with LyX 2.3.7, the PDF file doesn't have any issue (with MikTeX, up to date). I'm attaching the logs (View > Messages Pane, with all logs enabled) and the corresponding test files (LyX 2.3 and 2.4). The conversion script has: infile = "C:/Users/Thibaut/Desktop/p k .pdf" when the input file was: ~\Desktop\píkà.pdf. The accented characters have been stripped. This is not a surprise, though, since toFilesystemEncoding has a comment that says it does not work with non-ASCII characters on Windows. The puzzle is why this worked on 2.3.7. None of this code has changed, so far as I can see. Enrico, do you know if there is a reason not to use toSafeFilesystemEncoding here instead? This is in GraphicsConverter.cpp, line 139. I am not able to reproduce the problem. We should try to understand what is the real issue before changing the source. -- Enrico -- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel
Re: RC1: Unable to read files with accents in names on Windows
On 1/22/24 17:53, Thibaut Cuvelier wrote: On Mon, 22 Jan 2024 at 23:00, Richard Kimberly Heck wrote: On 1/22/24 16:53, didiergab...@free.fr wrote: I also realize that I can no longer load images whose names contain accents. In any case, that’s what’s happening with the file I just sent you. If I rename the file: SchemaCinematique.pdf to SchemaCinématique.pdf then I can read “Error converting to a readable format.” That's a serious bug. Can anyone on Windows check this? I can reproduce with PDF files whose names have accents, but not PNG (with the same file name apart from the extension). If I export the file to LyX 2.3 and load it with LyX 2.3.7, the PDF file doesn't have any issue (with MikTeX, up to date). I'm attaching the logs (View > Messages Pane, with all logs enabled) and the corresponding test files (LyX 2.3 and 2.4). The conversion script has: infile = "C:/Users/Thibaut/Desktop/p k .pdf" when the input file was: ~\Desktop\píkà.pdf. The accented characters have been stripped. This is not a surprise, though, since toFilesystemEncoding has a comment that says it does not work with non-ASCII characters on Windows. The puzzle is why this worked on 2.3.7. None of this code has changed, so far as I can see. Enrico, do you know if there is a reason not to use toSafeFilesystemEncoding here instead? This is in GraphicsConverter.cpp, line 139. Riki -- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel
Re: RC1: Unable to read files with accents in names on Windows
On Mon, 22 Jan 2024 at 23:53, Thibaut Cuvelier wrote: > On Mon, 22 Jan 2024 at 23:00, Richard Kimberly Heck > wrote: > >> On 1/22/24 16:53, didiergab...@free.fr wrote: >> >> I also realize that I can no longer load images whose names contain accents. >> In any case, that’s what’s happening with the file I just sent you. If I >> rename the file: >> SchemaCinematique.pdf to SchemaCinématique.pdf >> then I can read “Error converting to a readable format.” >> >> That's a serious bug. Can anyone on Windows check this? >> > I can reproduce with PDF files whose names have accents, but not PNG (with > the same file name apart from the extension). If I export the file to LyX > 2.3 and load it with LyX 2.3.7, the PDF file doesn't have any issue (with > MikTeX, up to date). > > I'm attaching the logs (View > Messages Pane, with all logs enabled) and > the corresponding test files (LyX 2.3 and 2.4). > Another data point: in the temporary folder LyX uses for this document (lyx_tmpdir.bsUGvNMjILjF), I have six files, all of them empty (size: zero byte). PS C:\Users\Thibaut\AppData\Local\Temp\lyx_tmpdir.bsUGvNMjILjF> ls -R Directory: C:\Users\Thibaut\AppData\Local\Temp\lyx_tmpdir.bsUGvNMjILjF Mode LastWriteTime Length Name - -- d- 22-Jan-24 23:47 lyx_tmpbuf0 -a 22-Jan-24 23:43 0 CacheItem.EtXrBH -a 22-Jan-24 23:44 0 CacheItem.nbPvek -a 22-Jan-24 23:42 0 CacheItem.OYeYzR -a 22-Jan-24 23:42 0 gconvertAvdsxC.pdf -a 22-Jan-24 23:43 0 gconvertFgslXV.pdf -a 22-Jan-24 23:44 0 gconvertIdshbV.pdf Directory: C:\Users\Thibaut\AppData\Local\Temp\lyx_tmpdir.bsUGvNMjILjF\lyx_tmpbuf0 Mode LastWriteTime Length Name - -- -a 22-Jan-24 23:47 2598 test.23.lyx -a 22-Jan-24 23:47 2938 test.lyx -- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel