D15068: Bindings: Correct handling of sources containing utf-8

2018-09-14 Thread Stefan Brüns
This revision was automatically updated to reflect the committed changes.
Closed by commit R240:28da9e614f66: Bindings: Correct handling of sources 
containing utf-8 (authored by bruns).

CHANGED PRIOR TO COMMIT
  https://phabricator.kde.org/D15068?vs=40392&id=41646#toc

REPOSITORY
  R240 Extra CMake Modules

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D15068?vs=40392&id=41646

REVISION DETAIL
  https://phabricator.kde.org/D15068

AFFECTED FILES
  find-modules/sip_generator.py

To: bruns, #frameworks, lbeltrame
Cc: lbeltrame, bcooksley, jtamate, kde-frameworks-devel, kde-buildsystem, 
michaelh, ngraham, bruns


D15068: Bindings: Correct handling of sources containing utf-8

2018-09-14 Thread Stefan Brüns
bruns added a comment.


  In D15068#325752 , @bcooksley 
wrote:
  
  > The code itself looks fine to me - the change on line 752 is to allow 
handling of macOS / Windows line endings in addition to just Unix line endings 
I presume?
  
  
  Yes. Thats what python does by default when reading files in text mode 
(converting all newline variants to Unix newlines).

REPOSITORY
  R240 Extra CMake Modules

REVISION DETAIL
  https://phabricator.kde.org/D15068

To: bruns, #frameworks, lbeltrame
Cc: lbeltrame, bcooksley, jtamate, kde-frameworks-devel, kde-buildsystem, 
michaelh, ngraham, bruns


D15068: Bindings: Correct handling of sources containing utf-8

2018-09-14 Thread Stefan Brüns
bruns added a comment.


  In D15068#325659 , @jtamate wrote:
  
  > Another solution (not tested here but used in other projects) could be to 
use
  >  with open(source, "r", encoding="utf-8") as f:
  >  (or if the file could contain the aberration BOM, you could use 
"utf-8-sig")
  
  
  Please read the full summary again why reading the source as utf-8 is a bad 
ide - hint: clang expresses ranges in bytes, not codepoints.

REPOSITORY
  R240 Extra CMake Modules

REVISION DETAIL
  https://phabricator.kde.org/D15068

To: bruns, #frameworks, lbeltrame
Cc: lbeltrame, bcooksley, jtamate, kde-frameworks-devel, kde-buildsystem, 
michaelh, ngraham, bruns


D15068: Bindings: Correct handling of sources containing utf-8

2018-09-14 Thread Luca Beltrame
lbeltrame accepted this revision.
lbeltrame added a comment.
This revision is now accepted and ready to land.


  LGTM.

INLINE COMMENTS

> sip_generator.py:752
>  #
> -return "".join(extract).replace("\n", " ")
> +return b''.join(extract).decode('utf-8').replace("\r\n", " 
> ").replace("\n", " ").replace("\r", " ")
>  

Perhaps mention in the comment that you're doing this for all the types of 
newlines?

REPOSITORY
  R240 Extra CMake Modules

REVISION DETAIL
  https://phabricator.kde.org/D15068

To: bruns, #frameworks, lbeltrame
Cc: lbeltrame, bcooksley, jtamate, kde-frameworks-devel, kde-buildsystem, 
michaelh, ngraham, bruns


D15068: Bindings: Correct handling of sources containing utf-8

2018-09-14 Thread Ben Cooksley
bcooksley added a comment.


  The code itself looks fine to me - the change on line 752 is to allow 
handling of macOS / Windows line endings in addition to just Unix line endings 
I presume?

REPOSITORY
  R240 Extra CMake Modules

REVISION DETAIL
  https://phabricator.kde.org/D15068

To: bruns, #frameworks
Cc: bcooksley, jtamate, kde-frameworks-devel, kde-buildsystem, michaelh, 
ngraham, bruns


D15068: Bindings: Correct handling of sources containing utf-8

2018-09-13 Thread Jaime Torres Amate
jtamate added a comment.


  Another solution (not tested here but used in other projects) could be to use
  with open(source, "r", encoding="utf-8") as f:
  (or if the file could contain the aberration BOM, you could use "utf-8-sig")
  
  And it should convert automatically all kind of line-ends and still return 
text.
  I'm assuming this is for python 3.5 or later.

REPOSITORY
  R240 Extra CMake Modules

REVISION DETAIL
  https://phabricator.kde.org/D15068

To: bruns, #frameworks
Cc: jtamate, kde-frameworks-devel, kde-buildsystem, michaelh, ngraham, bruns


D15068: Bindings: Correct handling of sources containing utf-8

2018-09-13 Thread Stefan Brüns
bruns added a comment.


  Can someone review this? It is currently not possible to build the bindings 
with python3, e.g. for kcoreaddons.

REPOSITORY
  R240 Extra CMake Modules

REVISION DETAIL
  https://phabricator.kde.org/D15068

To: bruns, #frameworks
Cc: kde-frameworks-devel, kde-buildsystem, michaelh, ngraham, bruns


D15068: Bindings: Correct handling of sources containing utf-8

2018-08-24 Thread Stefan Brüns
bruns edited the test plan for this revision.

REPOSITORY
  R240 Extra CMake Modules

REVISION DETAIL
  https://phabricator.kde.org/D15068

To: bruns, #frameworks
Cc: kde-frameworks-devel, kde-buildsystem, michaelh, ngraham, bruns


D15068: Bindings: Correct handling of sources containing utf-8

2018-08-24 Thread Stefan Brüns
bruns created this revision.
bruns added a reviewer: Frameworks.
Herald added projects: Frameworks, Build System.
Herald added subscribers: kde-buildsystem, kde-frameworks-devel.
bruns requested review of this revision.

REVISION SUMMARY
  Depending on the locale, python3 may try to decode the source as ASCII
  when the file is opened in text mode. This will fail as soon as the
  code contains utf-8, e.g. (c) symbols.
  
  While it is possible to specify the encoding when reading the file,
  this is bad for several reasons:
  
  - only a very small part of the source is processed via _read_source, no need 
to decode the complete source and store it as string objects
  - the clang Cursor.extent.{start,end}.column refers to bytes, not multibyte 
characters.
  
  While python2 processes utf-8 containing sources without error messages,
  wrong extent borders are also an issue.
  
  The practical impact is low, as the issue only manifests if there is a
  multibyte character in front of *and* on the same line as the read token.

TEST PLAN
  Python3: Build any bindings which contains sources with non-ASCII codepoints,
  e.g. kcoreaddons. Unpatched version fails when using e.g. LANG=C.
  Python2: Both versions generate sources successfully.

REPOSITORY
  R240 Extra CMake Modules

REVISION DETAIL
  https://phabricator.kde.org/D15068

AFFECTED FILES
  find-modules/sip_generator.py

To: bruns, #frameworks
Cc: kde-frameworks-devel, kde-buildsystem, michaelh, ngraham, bruns