Just note that the zero-width space ZWSP (U+200B) is actually used with a very good
semantic reason.
It is used to break words in Asian languages like Thai or Lao, where most words are
monosyllabic, vowels are written around the initial consonant, and which do not put spaces
between words.
Presently, software have most often the ability to break words using a algorithmic
approach (I am not sure where the information is encoded!). It was not the case 3-4 years
ago. But anyhow, having this ZWSP to separate words helps ensure proper text flow when
reformatting paragraphs.
Silvain
Le 01.01.22 à 03:18, Matt Miller a écrit :
On Fri, Dec 31, 2021, at 15:59, Gregory Pittman wrote:
On 12/31/21 14:19, Matt Miller wrote:
On Fri, Dec 31, 2021, at 07:11, Gregory Pittman wrote:
On 12/30/21 20:49, Matt Miller wrote:
I'm loading a text frame from a utf-8 encoded text file, and within my Scribus
Python code I want to search for the standard newline character, ascii value
10. When I see an ascii 10 as the line separator I want to apply a special
paragraph style to the following paragraph. Most paragraphs end with the
Unicode paragraph separator character, \u2029, and in those cases the default
paragraph style is fine.
My problem is that both these types of characters are matching '\r' when I use
re.search in python. also, if I select either line separator character, then do
getText(), I get a '\r' no matter what. I've confirmed that my file encoding is
utf-8. What am I missing? How can I search for a simple '\n' character?
Hi Matt,
You don't say what OS you're using.
Linux
Maybe running dos2unix on the text would help.
Well, my ideal workflow is one where the contents of the text file instruct
Scribus what paragraph and character styles to use throughout the document. So,
I'm careful to put exactly the characters I want into the text file. I've been
largely successful, but now I've run into a case where it seems Scribus (or
Python) is losing information when I load the file. From inside Scribus I can't
distinguish between a Unicode paragraph separator, \u2029, and a simple line
feed, \u000A.
I'm able to open my text file from the Python console and dump it out to see that newline is displayed as
"\n" and a paragraph separator is displayed as "\u2029." So, I'm suspecting the problem
is with Scribus, or how I'm using it. I'm loading the file using insertHtmlText(), but I get the same bad
behavior from the GUI when I do "Content | Get Text..." and load the file manually.
I've attached a text file that shows the problem. If you run the following in the
scripter console from a document with a text frame named "Text1" you should see
the problem:
When I save this file then open it with kwrite, then when I do a
Replace operation trying to switch \n to <p> (an arbitrary choice), I
see the replacement happen at the end of the 2nd, 3rd, 4th, and 6th
sentences.
I duplicated that behavior with kwrite. It seems only the 2nd and 6th sentences
should have been replaced, since only those have an actual \u000A. But, I guess
each app does some of its own interpretation as to what '\n' means. I suppose
that's similar to what Scribus is doing when it decides that \u2029 and \u000A
should both appear as '\r' in the Python string.
So, I changed my special marker character that I expect in my text files from a
line feed to a zero-width space. That character isn't as semantically
meaningful to me for this purpose as a line feed, but it works.
Thanks for the input.
Greg
___
Scribus Mailing List:scribus@lists.scribus.net
Edit your options or unsubscribe:
http://lists.scribus.net/mailman/listinfo/scribus
See also:
http://wiki.scribus.net
http://forums.scribus.net
--
Silvain Dupertuis
Route de Lausanne 335
1293 Bellevue (Switzerland)
tél. +41-(0)22-774.20.67
portable +41-(0)79-604.87.52
web: silvain-dupertuis.org <https://perso.silvain-dupertuis.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.scribus.net/pipermail/scribus/attachments/20220104/d4bec358/attachment.htm>
___
Scribus Mailing List: scribus@lists.scribus.net
Edit your options or unsubscribe:
http://lists.scribus.net/mailman/listinfo/scribus
See also:
http://wiki.scribus.net
http://forums.scribus.net