Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError

2022-08-27 Thread Martin Landa
Hi,

st 24. 8. 2022 v 12:03 odesílatel Vaclav Petras 
napsal:

>
>>> 
>>>
>>
>> +1 for switching to UTF-8
>>
>> The HTML files may already use UTF-8 (?), but the parser may emit HTML in
>>> system-dependent encoding. However, the source code it is using should be
>>> UTF-8 or more likely it is simply ASCII, so perhaps not much to worry about.
>>>
>>
>> I am not sure why the parser should emit HTML in system-dependent
>> encoding. Why simply not use UTF-8 as suggested in PR [1]?
>>
>
> It should emit UTF-8, I don't know what it does now.
>

I have created new PR [1]. Martin

[1] https://github.com/OSGeo/grass/pull/2547

-- 
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError

2022-08-26 Thread Martin Landa
Hi,

st 24. 8. 2022 v 12:03 odesílatel Vaclav Petras 
napsal

> Back to the original problem, how can we solve the problem with
>> compilation on Windows 2016 without changing the code base of grass78
>> significantly? BTW, I was able to compile grass78 on the same machine a few
>> weeks ago and I don't see any related changes in v.random.html... (?)
>>
>
> The PR looks okay on the surface. Maybe you can just remove the
> problematic character in 7.8.
>

Source of the problem is a git log message [1], not the manual page itself.
I modified PR [2]:

* HTML file is decode using ISO-8859-1
* git log message is decoded using UTF-8

Martin

[1]
https://github.com/OSGeo/grass/commit/3b6d257bdfc18a58dd42c5ab06c69ada99c56a24
[2] https://github.com/OSGeo/grass/pull/2533/files

-- 
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError

2022-08-24 Thread Vaclav Petras
On Wed, Aug 24, 2022, 4:51 AM Martin Landa  wrote:

> Hi Vaclav,
>
> st 24. 8. 2022 v 10:41 odesílatel Vaclav Petras 
> napsal:
>
>> The lib/gis/parser_html.c puts iso-8859-1 into the HTML files (I just
>> checked that now), so that's what an HTML reader should be using. That's of
>> course not what we want at this point. It just should be UTF-8 everywhere.
>>
>> 
>>
>
> +1 for switching to UTF-8
>
> The HTML files may already use UTF-8 (?), but the parser may emit HTML in
>> system-dependent encoding. However, the source code it is using should be
>> UTF-8 or more likely it is simply ASCII, so perhaps not much to worry about.
>>
>
> I am not sure why the parser should emit HTML in system-dependent
> encoding. Why simply not use UTF-8 as suggested in PR [1]?
>

It should emit UTF-8, I don't know what it does now.


> Back to the original problem, how can we solve the problem with
> compilation on Windows 2016 without changing the code base of grass78
> significantly? BTW, I was able to compile grass78 on the same machine a few
> weeks ago and I don't see any related changes in v.random.html... (?)
>

The PR looks okay on the surface. Maybe you can just remove the problematic
character in 7.8.


> Martin
>
> --
> Martin Landa
> http://geo.fsv.cvut.cz/gwiki/Landa
> http://gismentors.cz/mentors/landa
>
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError

2022-08-24 Thread Martin Landa
Hi Vaclav,

st 24. 8. 2022 v 10:41 odesílatel Vaclav Petras 
napsal:

> The lib/gis/parser_html.c puts iso-8859-1 into the HTML files (I just
> checked that now), so that's what an HTML reader should be using. That's of
> course not what we want at this point. It just should be UTF-8 everywhere.
>
> 
>

+1 for switching to UTF-8

The HTML files may already use UTF-8 (?), but the parser may emit HTML in
> system-dependent encoding. However, the source code it is using should be
> UTF-8 or more likely it is simply ASCII, so perhaps not much to worry about.
>

I am not sure why the parser should emit HTML in system-dependent encoding.
Why simply not use UTF-8 as suggested in PR [1]?

Back to the original problem, how can we solve the problem with compilation
on Windows 2016 without changing the code base of grass78 significantly?
BTW, I was able to compile grass78 on the same machine a few weeks ago and
I don't see any related changes in v.random.html... (?)

Martin

-- 
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError

2022-08-24 Thread Vaclav Petras
Hi Martin,

On Wed, 24 Aug 2022 at 04:25, Martin Landa  wrote:

>
> the question is also why we are using default OS encoding to decode HTML
> pages [1]. Couldn't we simply use UTF-8 regardless of OS system locale?
>

This seems to be some general confusion around that, or more likely just
some legacy code.

The lib/gis/parser_html.c puts iso-8859-1 into the HTML files (I just
checked that now), so that's what an HTML reader should be using. That's of
course not what we want at this point. It just should be UTF-8 everywhere.



The HTML files may already use UTF-8 (?), but the parser may emit HTML in
system-dependent encoding. However, the source code it is using should be
UTF-8 or more likely it is simply ASCII, so perhaps not much to worry about.

Vaclav
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError

2022-08-24 Thread Martin Landa
st 24. 8. 2022 v 10:24 odesílatel Martin Landa 
napsal:

> the question is also why we are using default OS encoding to decode HTML
> pages [1]. Couldn't we simply use UTF-8 regardless of OS system locale?
>

see also related PR: https://github.com/OSGeo/grass/pull/2533

Martin

-- 
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError

2022-08-24 Thread Martin Landa
Dear all,

út 23. 8. 2022 v 18:49 odesílatel Martin Landa 
napsal:

> python3 /usr/src/grass78/dist.x86_64-w64-mingw32/tools/mkhtml.py v.random
> > /usr/src/grass78/dist.x86_64-w64-mingw32/docs/html/v.random.html
>
> Traceback (most recent call last):
>   File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", 
> line 648, in 
> git_commit = get_last_git_commit(
>   File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", 
> line 235, in get_last_git_commit
> stdout = decode(stdout)
>   File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py", 
> line 111, in decode
> return bytes_.decode(enc)
>   File "C:\\OSGeo4W\\apps\\Python39\lib\encodings\cp1250.py", line 15, in 
> decode
> return codecs.charmap_decode(input,errors,decoding_table)
> UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 58: 
> character maps to 
> the question is also why we are using default OS encoding to decode HTML
pages [1]. Couldn't we simply use UTF-8 regardless of OS system locale?

Martin

[1]
https://github.com/OSGeo/grass/blob/releasebranch_7_8/tools/mkhtml.py#L93

-- 
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


[GRASS-dev] mkhtml fails on Windows with UnicodeDecodeError

2022-08-23 Thread Martin Landa
Hi,

currently wingrass78 builds on wingrass.fsv.cvut.cz fails with [1]:

VERSION_NUMBER=7.8.8dev VERSION_DATE=2022 MODULE_TOPDIR=../.. \
python3
/usr/src/grass78/dist.x86_64-w64-mingw32/tools/mkhtml.py v.random >
/usr/src/grass78/dist.x86_64-w64-mingw32/docs/html/v.random.html
Traceback (most recent call last):
  File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py",
line 648, in 
git_commit = get_last_git_commit(
  File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py",
line 235, in get_last_git_commit
stdout = decode(stdout)
  File "C:\msys64\usr\src\grass78\dist.x86_64-w64-mingw32\tools\mkhtml.py",
line 111, in decode
return bytes_.decode(enc)
  File "C:\\OSGeo4W\\apps\\Python39\lib\encodings\cp1250.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position
58: character maps to 

On Windows 2016 I can't change encoding to UTF-8 (I am getting a
similar error with cp1252). I tried to set up LC_ALL or
PYTHONIOENCODING, but nothing helped. Default encoding reported by
locale.getdefaultlocale() is still cp1250/cp1252 and not UTF-8. Any
idea how I can change default encoding on Windows to UTF-8?

Thanks in advance! Martin

[1] 
https://wingrass.fsv.cvut.cz/grass78/x86_64/logs/log-r1f724052b-1/package.log


-- 
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev