Re: Fwd: String/Bytes Problem in layout2layout.py

2022-01-03 Thread José Abílio Matos
On Monday, 3 January 2022 18.32.14 WET José Abílio Matos wrote:
> In any case I found other cases where the code is wrong (like the BOM
> removal that does not work) and that it does not give an error although it
> is wrong.

For future reference this is an issue that happens silently. In this case 
Python's versatility comes to bite it.

Notice the following code:

line = "\357\273\277"
if line[0:3] == b"\357\273\277":
print ("BOM found")

This could will give different results in Python 2 and 3.

The problem is that in Python 2 "\357\273\277" and b"\357\273\277" have the 
same type while in Python 3 they are different. That would not be a problem if 
it were not for another feature of Python, it is possible to compare different 
types and in that case the answer is naturally False.

The reason for this is to allow the comparison with None, that is usually used 
as sentinel in lots of code.

This bug is insidious because the code seems to work, there are no errors as 
those that led the original reporter to issue the proposed fix but the problem 
is there. :-(

As far as I understand the problem is specific to comparisons, since all the 
other operations will fail since we are using different types...
-- 
José Abílio-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: Fwd: String/Bytes Problem in layout2layout.py

2022-01-03 Thread José Abílio Matos
On Monday, 3 January 2022 16.02.51 WET Pavel Sanda wrote:
> On Mon, Jan 03, 2022 at 03:16:47PM +, José Abílio Matos wrote:
> > If you want I can take care of that, in 2.4, and see if there are cases
> > where the conversion is missing.
> 
> Please do, I suffer from ophidiophobia.

You keep insisting that the Norwegian Blue parrot is alive: :-D
https://en.wikipedia.org/wiki/Dead_Parrot_sketch

The name comes from the Monty Python's Flying Circus. :-)

In any case I found other cases where the code is wrong (like the BOM removal 
that does not work) and that it does not give an error although it is wrong.

The patch follows attached since I do not have time to test it.

> > @Riki: is it possible to have a layout file such that the encoding is not
> > utf-8?
> 
> The original reports most likely downloaded layouts referred in our wiki
> (https://wiki.lyx.org/Layouts/Layouts , LyXBook), so I guess that layout
> encoding is not in our hands.

No the issue is if LyX accepts, or not, other encoding that UTF-8. If not all 
the care is useless. Looking to the similar pref2prefs.py we see that it 
assumes utf-8.

> Pavel


-- 
José Abíliodiff --git a/lib/scripts/layout2layout.py b/lib/scripts/layout2layout.py
index 7e40bfbe12..88ada1213e 100644
--- a/lib/scripts/layout2layout.py
+++ b/lib/scripts/layout2layout.py
@@ -256,7 +256,7 @@ currentFormat = 95
 # New textclass tag BibInToc
 
 # Incremented to format 77, 6 August 2019 by spitz
-# New textclass tag PageSize (= default page size) 
+# New textclass tag PageSize (= default page size)
 # and textclass option PageSize (= list of available page sizes)
 
 # Incremented to format 78, 6 August 2019 by spitz
@@ -354,7 +354,7 @@ def error(message):
 
 def trim_bom(line):
 " Remove byte order mark."
-if line[0:3] == "\357\273\277":
+if line[0:3] == b"\357\273\277":
 return line[3:]
 else:
 return line
@@ -444,8 +444,8 @@ def convert(lines, end_format):
 # for categories
 re_Declaration = re.compile(b'^#\\s*\\Declare\\w+Class.*$')
 re_ExtractCategory = re.compile(b'^(#\\s*\\Declare\\w+Class(?:\\[[^]]*?\\])?){([^(]+?)\\s+\\(([^)]+?)\\)\\s*}\\s*$')
-ConvDict = {"article": "Articles", "book" : "Books", "letter" : "Letters", "report": "Reports",
-"presentation" : "Presentations", "curriculum vitae" : "Curricula Vitae", "handout" : "Handouts"}
+ConvDict = {b"article": b"Articles", b"book": b"Books", b"letter": b"Letters", b"report": b"Reports",
+b"presentation": b"Presentations", b"curriculum vitae": b"Curricula Vitae", b"handout": b"Handouts"}
 # Arguments
 re_OptArgs = re.compile(b'^(\\s*)OptionalArgs(\\s+)(\\d+)\\D*$', re.IGNORECASE)
 re_ReqArgs = re.compile(b'^(\\s*)RequiredArgs(\\s+)(\\d+)\\D*$', re.IGNORECASE)
@@ -615,7 +615,7 @@ def convert(lines, end_format):
 continue
 col  = match.group(2)
 if col == "collapsable":
-lines[i] = match.group(1) + "collapsible"
+lines[i] = match.group(1) + b"collapsible"
 i += 1
 continue
 
@@ -833,7 +833,7 @@ def convert(lines, end_format):
 # Insert the required number of arguments at the end of the style definition
 match = re_End.match(lines[i])
 if match:
-newarg = ['']
+newarg = [b'']
 # First the optionals (this is the required order pre 2.1)
 if opts > 0:
 if opts == 1:
@@ -1283,7 +1283,7 @@ def convert(lines, end_format):
 if latextype == b"item_environment" and label.lower() == b"counter_enumi":
 lines[labeltype_line] = re_LabelType.sub(b'\\1\\2\\3Enumerate', lines[labeltype_line])
 # Don't add the LabelCounter line later
-counter = ""
+counter = b""
 
 # Replace
 #
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: Fwd: String/Bytes Problem in layout2layout.py

2022-01-03 Thread Pavel Sanda
On Mon, Jan 03, 2022 at 03:16:47PM +, José Abílio Matos wrote:
> If you want I can take care of that, in 2.4, and see if there are cases where 
> the conversion is missing.

Please do, I suffer from ophidiophobia.

> @Riki: is it possible to have a layout file such that the encoding is not 
> utf-8?

The original reports most likely downloaded layouts referred in our wiki
(https://wiki.lyx.org/Layouts/Layouts , LyXBook), so I guess that layout
encoding is not in our hands.

Pavel
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: Fwd: String/Bytes Problem in layout2layout.py

2022-01-03 Thread José Abílio Matos
On Monday, 3 January 2022 14.55.53 WET Pavel Sanda wrote:
> If I get that right the part of the "..." -> b"..." should be committed to
> 2.4?
> 
> Pavel

Good point. Yes, it should (I thought that it already was) specially in order 
to be consistent with all the other code that already does that.

The issue is that in Python 2 str (string type) does not has an encoding 
associated, later there was a new string type (unicode) where the string has 
an encode associated.
In Python 3 strings become encoding aware and the previous strings were 
renamed bytes. So b"..." (bytes string) is a no-op in Python 2, because it 
corresponds to "...". Similarly u"..." (unicode string) is a no-op in Python 3 
since it corresponds to "...".

If you want I can take care of that, in 2.4, and see if there are cases where 
the conversion is missing.

@Riki: is it possible to have a layout file such that the encoding is not 
utf-8?

-- 
José Abílio-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: Fwd: String/Bytes Problem in layout2layout.py

2022-01-03 Thread Pavel Sanda
On Mon, Jan 03, 2022 at 02:04:20PM +, José Abílio Matos wrote:
> Looking into further detail I would easily that the first part of the patch 
> is 
> correct (change "..." to b"...").
> 
> The second part where it changes sys.stdin to sys.stdin.buffer is probably 
> incorrect:
> 
> The similar code in 2.4 is:
> # Open files
> if options.input_file:
> source = open(options.input_file, 'rb')
> elif PY2:
> source = sys.stdin
> else:
> source = sys.stdin.buffer
> 
> if options.output_file:
> output = open(options.output_file, 'wb')
> elif PY2:
> output = sys.stdout
> else:
> output = sys.stdout.buffer
> 
> So this should be the right change, keep the previous form if PY2 (python 2 
> is 
> used) else use the new call.

If I get that right the part of the "..." -> b"..." should be committed to 2.4?

Pavel
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: Fwd: String/Bytes Problem in layout2layout.py

2022-01-03 Thread José Abílio Matos
On Wednesday, 29 December 2021 14.52.29 WET Pavel Sanda wrote:
> Jose,
> 
> are the proposed changes sensible?
> I remember there were flowing similar patches to python codebase before.

The changes are reasonable for python 3.
I am not so sure about python 2 (because we support it) although it seems 
likely. :-)

We are using bytes here to avoid choosing an encoding, that it should be utf-8 
for all layout files, if I am not mistaken. In lyx2lyx we do this in 2 passes, 
first we determine the encoding and then we use that information to read the 
correct encoding.

Looking into further detail I would easily that the first part of the patch is 
correct (change "..." to b"...").

The second part where it changes sys.stdin to sys.stdin.buffer is probably 
incorrect:

The similar code in 2.4 is:
# Open files
if options.input_file:
source = open(options.input_file, 'rb')
elif PY2:
source = sys.stdin
else:
source = sys.stdin.buffer

if options.output_file:
output = open(options.output_file, 'wb')
elif PY2:
output = sys.stdout
else:
output = sys.stdout.buffer

So this should be the right change, keep the previous form if PY2 (python 2 is 
used) else use the new call.

-- 
José Abílio-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Fwd: String/Bytes Problem in layout2layout.py

2021-12-29 Thread Pavel Sanda
Jose,

are the proposed changes sensible?
I remember there were flowing similar patches to python codebase before.

Pavel
- Forwarded message from "Leo L. Schwab"  -

From: "Leo L. Schwab" 
To: Debian Bug Tracking System 
Subject: Bug#1002821: lyx-common: String/Bytes Problem in layout2layout.py

Package: lyx-common
Version: 2.3.6-1
Severity: normal
Tags: patch upstream
X-Debbugs-Cc: ew...@ewhac.org

Dear Maintainer,

Discovered this while trying to use Editorium's LyXBook modules.
layout2layout.py was konking out with "TypeError: cannot use a bytes
pattern on a string-like object."  After a bunch of debugging, I found
some strings in the script that hadn't been bytes-ified, which seemed to
fix the problem.  Patch attached.

Schwab


-- System Information:
Debian Release: bookworm/sid
  APT prefers testing-security
  APT policy: (500, 'testing-security'), (500, 'unstable'), (500, 'testing'), 
(500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.15.0-2-amd64 (SMP w/12 CPU threads)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, 
TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages lyx-common depends on:
ii  python3 3.9.8-1
ii  tex-common  6.17

Versions of packages lyx-common recommends:
ii  lyx  2.3.6-1

lyx-common suggests no packages.

-- no debconf information

--- /usr/share/lyx/scripts/layout2layout.py 2020-12-01 02:33:35.0 
-0800
+++ ./layout2layout.py  2021-12-29 01:04:59.614016427 -0800
@@ -484,8 +484,8 @@
 i += 1
 continue
 col  = match.group(2)
-if col == "collapsable":
-lines[i] = match.group(1) + "collapsible"
+if col == b"collapsable":
+lines[i] = match.group(1) + b"collapsible"
 i += 1
 continue
 
@@ -703,7 +703,7 @@
 # Insert the required number of arguments at the end of the style 
definition
 match = re_End.match(lines[i])
 if match:
-newarg = ['']
+newarg = [b'']
 # First the optionals (this is the required order pre 2.1)
 if opts > 0:
 if opts == 1:
@@ -1153,7 +1153,7 @@
 if latextype == b"item_environment" and label.lower() == 
b"counter_enumi":
 lines[labeltype_line] = 
re_LabelType.sub(b'\\1\\2\\3Enumerate', lines[labeltype_line])
 # Don't add the LabelCounter line later
-counter = ""
+counter = b""
 
 # Replace
 #
@@ -1227,12 +1227,12 @@
 if options.input_file:
 source = open(options.input_file, 'rb')
 else:
-source = sys.stdin
+source = sys.stdin.buffer
 
 if options.output_file:
 output = open(options.output_file, 'wb')
 else:
-output = sys.stdout
+output = sys.stdout.buffer
 
 if options.format > currentFormat:
 error("Format %i does not exist" % options.format);


- End forwarded message -
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel