Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError
Hi, st 24. 8. 2022 v 12:03 odesílatel Vaclav Petras napsal: > >>> >>> >> >> +1 for switching to UTF-8 >> >> The HTML files may already use UTF-8 (?), but the parser may emit HTML in >>> system-dependent encoding. However, the source code it is using should be >>> UTF-8 or more likely it is simply ASCII, so perhaps not much to worry about. >>> >> >> I am not sure why the parser should emit HTML in system-dependent >> encoding. Why simply not use UTF-8 as suggested in PR [1]? >> > > It should emit UTF-8, I don't know what it does now. > I have created new PR [1]. Martin [1] https://github.com/OSGeo/grass/pull/2547 -- Martin Landa http://geo.fsv.cvut.cz/gwiki/Landa http://gismentors.cz/mentors/landa ___ grass-dev mailing list grass-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError
Hi, st 24. 8. 2022 v 12:03 odesílatel Vaclav Petras napsal > Back to the original problem, how can we solve the problem with >> compilation on Windows 2016 without changing the code base of grass78 >> significantly? BTW, I was able to compile grass78 on the same machine a few >> weeks ago and I don't see any related changes in v.random.html... (?) >> > > The PR looks okay on the surface. Maybe you can just remove the > problematic character in 7.8. > Source of the problem is a git log message [1], not the manual page itself. I modified PR [2]: * HTML file is decode using ISO-8859-1 * git log message is decoded using UTF-8 Martin [1] https://github.com/OSGeo/grass/commit/3b6d257bdfc18a58dd42c5ab06c69ada99c56a24 [2] https://github.com/OSGeo/grass/pull/2533/files -- Martin Landa http://geo.fsv.cvut.cz/gwiki/Landa http://gismentors.cz/mentors/landa ___ grass-dev mailing list grass-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError
On Wed, Aug 24, 2022, 4:51 AM Martin Landa wrote: > Hi Vaclav, > > st 24. 8. 2022 v 10:41 odesílatel Vaclav Petras > napsal: > >> The lib/gis/parser_html.c puts iso-8859-1 into the HTML files (I just >> checked that now), so that's what an HTML reader should be using. That's of >> course not what we want at this point. It just should be UTF-8 everywhere. >> >> >> > > +1 for switching to UTF-8 > > The HTML files may already use UTF-8 (?), but the parser may emit HTML in >> system-dependent encoding. However, the source code it is using should be >> UTF-8 or more likely it is simply ASCII, so perhaps not much to worry about. >> > > I am not sure why the parser should emit HTML in system-dependent > encoding. Why simply not use UTF-8 as suggested in PR [1]? > It should emit UTF-8, I don't know what it does now. > Back to the original problem, how can we solve the problem with > compilation on Windows 2016 without changing the code base of grass78 > significantly? BTW, I was able to compile grass78 on the same machine a few > weeks ago and I don't see any related changes in v.random.html... (?) > The PR looks okay on the surface. Maybe you can just remove the problematic character in 7.8. > Martin > > -- > Martin Landa > http://geo.fsv.cvut.cz/gwiki/Landa > http://gismentors.cz/mentors/landa > ___ grass-dev mailing list grass-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError
Hi Vaclav, st 24. 8. 2022 v 10:41 odesílatel Vaclav Petras napsal: > The lib/gis/parser_html.c puts iso-8859-1 into the HTML files (I just > checked that now), so that's what an HTML reader should be using. That's of > course not what we want at this point. It just should be UTF-8 everywhere. > > > +1 for switching to UTF-8 The HTML files may already use UTF-8 (?), but the parser may emit HTML in > system-dependent encoding. However, the source code it is using should be > UTF-8 or more likely it is simply ASCII, so perhaps not much to worry about. > I am not sure why the parser should emit HTML in system-dependent encoding. Why simply not use UTF-8 as suggested in PR [1]? Back to the original problem, how can we solve the problem with compilation on Windows 2016 without changing the code base of grass78 significantly? BTW, I was able to compile grass78 on the same machine a few weeks ago and I don't see any related changes in v.random.html... (?) Martin -- Martin Landa http://geo.fsv.cvut.cz/gwiki/Landa http://gismentors.cz/mentors/landa ___ grass-dev mailing list grass-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError
Hi Martin, On Wed, 24 Aug 2022 at 04:25, Martin Landa wrote: > > the question is also why we are using default OS encoding to decode HTML > pages [1]. Couldn't we simply use UTF-8 regardless of OS system locale? > This seems to be some general confusion around that, or more likely just some legacy code. The lib/gis/parser_html.c puts iso-8859-1 into the HTML files (I just checked that now), so that's what an HTML reader should be using. That's of course not what we want at this point. It just should be UTF-8 everywhere. The HTML files may already use UTF-8 (?), but the parser may emit HTML in system-dependent encoding. However, the source code it is using should be UTF-8 or more likely it is simply ASCII, so perhaps not much to worry about. Vaclav ___ grass-dev mailing list grass-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError
st 24. 8. 2022 v 10:24 odesílatel Martin Landa napsal: > the question is also why we are using default OS encoding to decode HTML > pages [1]. Couldn't we simply use UTF-8 regardless of OS system locale? > see also related PR: https://github.com/OSGeo/grass/pull/2533 Martin -- Martin Landa http://geo.fsv.cvut.cz/gwiki/Landa http://gismentors.cz/mentors/landa ___ grass-dev mailing list grass-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError
Dear all, út 23. 8. 2022 v 18:49 odesílatel Martin Landa napsal: > python3 /usr/src/grass78/dist.x86_64-w64-mingw32/tools/mkhtml.py v.random > > /usr/src/grass78/dist.x86_64-w64-mingw32/docs/html/v.random.html > > Traceback (most recent call last): > File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", > line 648, in > git_commit = get_last_git_commit( > File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", > line 235, in get_last_git_commit > stdout = decode(stdout) > File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", > line 111, in decode > return bytes_.decode(enc) > File "C:\\OSGeo4W\\apps\\Python39\lib\encodings\cp1250.py", line 15, in > decode > return codecs.charmap_decode(input,errors,decoding_table) > UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 58: > character maps to > the question is also why we are using default OS encoding to decode HTML pages [1]. Couldn't we simply use UTF-8 regardless of OS system locale? Martin [1] https://github.com/OSGeo/grass/blob/releasebranch_7_8/tools/mkhtml.py#L93 -- Martin Landa http://geo.fsv.cvut.cz/gwiki/Landa http://gismentors.cz/mentors/landa ___ grass-dev mailing list grass-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/grass-dev
[GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError
Hi, currently wingrass78 builds on wingrass.fsv.cvut.cz fails with [1]: VERSION_NUMBER=7.8.8dev VERSION_DATE=2022 MODULE_TOPDIR=../.. \ python3 /usr/src/grass78/dist.x86_64-w64-mingw32/tools/mkhtml.py v.random > /usr/src/grass78/dist.x86_64-w64-mingw32/docs/html/v.random.html Traceback (most recent call last): File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", line 648, in git_commit = get_last_git_commit( File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", line 235, in get_last_git_commit stdout = decode(stdout) File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", line 111, in decode return bytes_.decode(enc) File "C:\\OSGeo4W\\apps\\Python39\lib\encodings\cp1250.py", line 15, in decode return codecs.charmap_decode(input,errors,decoding_table) UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 58: character maps to On Windows 2016 I can't change encoding to UTF-8 (I am getting a similar error with cp1252). I tried to set up LC_ALL or PYTHONIOENCODING, but nothing helped. Default encoding reported by locale.getdefaultlocale() is still cp1250/cp1252 and not UTF-8. Any idea how I can change default encoding on Windows to UTF-8? Thanks in advance! Martin [1] https://wingrass.fsv.cvut.cz/grass78/x86_64/logs/log-r1f724052b-1/package.log -- Martin Landa http://geo.fsv.cvut.cz/gwiki/Landa http://gismentors.cz/mentors/landa ___ grass-dev mailing list grass-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/grass-dev