[kdev-python] [Bug 395259] Non-ascii text shifts identifier locations

Francis Herne Wed, 22 Aug 2018 08:34:05 -0700

https://bugs.kde.org/show_bug.cgi?id=395259


Francis Herne <m...@flherne.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |CONFIRMED
                 CC|                            |m...@flherne.uk
     Ever confirmed|0                           |1

--- Comment #1 from Francis Herne <m...@flherne.uk> ---
I spent a little while looking at this.

The cause is that the CPython parser (used by KDevelop) returns all offsets in
UTF-8 bytes, while the KTextEditor API uses actual characters.

Anything represented using >1 byte in UTF-8 thus causes the offset.

The only way I see to fix this would be to scan for multi-byte characters and
do yet another set of range fixups, which would be quite expensive while
benefitting very few scenarios.

(we can't remove such characters before feeding the parser, because they can
appear in docstrings or even identifiers)

The other alternative would be to have our own parser (again); that's clearly
not worthwhile for this alone, but there's already a lot of ugly code to
workaround various limitations/lossiness and statements by the CPython devs
(e.g. https://bugs.python.org/issue32911#msg313698) suggest it's only likely to
get worse.

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdev-python] [Bug 395259] Non-ascii text shifts identifier locations

Reply via email to