Just note that the zero-width space ZWSP (U+200B) is actually used with a very good semantic reason.

It is used to break words in Asian languages like Thai or Lao, where most words are monosyllabic, vowels are written around the initial consonant, and which do not put spaces between words.

Presently, software have most often the ability to break words using a algorithmic approach (I am not sure where the information is encoded!). It was not the case 3-4 years ago. But anyhow, having this ZWSP to separate words helps ensure proper text flow when reformatting paragraphs.

Silvain


Le 01.01.22 à 03:18, Matt Miller a écrit :

On Fri, Dec 31, 2021, at 15:59, Gregory Pittman wrote:
On 12/31/21 14:19, Matt Miller wrote:
On Fri, Dec 31, 2021, at 07:11, Gregory Pittman wrote:
On 12/30/21 20:49, Matt Miller wrote:
I'm loading a text frame from a utf-8 encoded text file, and within my Scribus 
Python code I want to search for the standard newline character, ascii value 
10. When I see an ascii 10 as the line separator I want to apply a special 
paragraph style to the following paragraph. Most paragraphs end with the 
Unicode paragraph separator character, \u2029, and in those cases the default 
paragraph style is fine.

My problem is that both these types of characters are matching '\r' when I use 
re.search in python. also, if I select either line separator character, then do 
getText(), I get a '\r' no matter what. I've confirmed that my file encoding is 
utf-8. What am I missing? How can I search for a simple '\n' character?
Hi Matt,

You don't say what OS you're using.
Linux

Maybe running dos2unix on the text would help.
Well, my ideal workflow is one where the contents of the text file instruct 
Scribus what paragraph and character styles to use throughout the document. So, 
I'm careful to put exactly the characters I want into the text file. I've been 
largely successful, but now I've run into a case where it seems Scribus (or 
Python) is losing information when I load the file. From inside Scribus I can't 
distinguish between a Unicode paragraph separator, \u2029, and a simple line 
feed, \u000A.

I'm able to open my text file from the Python console and dump it out to see that newline is displayed as 
"\n" and a paragraph separator is displayed as "\u2029." So, I'm suspecting the problem 
is with Scribus, or how I'm using it. I'm loading the file using insertHtmlText(), but I get the same bad 
behavior from the GUI when I do "Content | Get Text..." and load the file manually.

I've attached a text file that shows the problem. If you run the following in the 
scripter console from a document with a text frame named "Text1" you should see 
the problem:

When I save this file then open it with kwrite, then when I do a
Replace operation trying to switch \n to <p> (an arbitrary choice), I
see the replacement happen at the end of the 2nd, 3rd, 4th, and 6th
sentences.
I duplicated that behavior with kwrite. It seems only the 2nd and 6th sentences 
should have been replaced, since only those have an actual \u000A. But, I guess 
each app does some of its own interpretation as to what '\n' means. I suppose 
that's similar to what Scribus is doing when it decides that \u2029 and \u000A 
should both appear as '\r' in the Python string.

So, I changed my special marker character that I expect in my text files from a 
line feed to a zero-width space. That character isn't as semantically 
meaningful to me for this purpose as a line feed, but it works.

Thanks for the input.

Greg


___
Scribus Mailing List:scribus@lists.scribus.net
Edit your options or unsubscribe:
http://lists.scribus.net/mailman/listinfo/scribus
See also:
http://wiki.scribus.net
http://forums.scribus.net


--
Silvain Dupertuis
Route de Lausanne 335
1293 Bellevue (Switzerland)
tél. +41-(0)22-774.20.67
portable +41-(0)79-604.87.52
web: silvain-dupertuis.org <https://perso.silvain-dupertuis.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.scribus.net/pipermail/scribus/attachments/20220104/d4bec358/attachment.htm>
___
Scribus Mailing List: scribus@lists.scribus.net
Edit your options or unsubscribe:
http://lists.scribus.net/mailman/listinfo/scribus
See also:
http://wiki.scribus.net
http://forums.scribus.net

Reply via email to