When I tried to write a similar parser some years ago (or rewrite the libsword parser(s) in Sword++), I discovered to my dismay that the wiki page is quite insufficient. The lack of a formal specification for the configuration format leads to various serious ambiguities or questions when wanting to write a parser. Some examples:

  * How should different parsing errors be handled?
* What are the phases for parsing? Should the output of each phase be a single string, or a list of strings parsed separately by next phases (e.g. lines in case of continuations)? * Should continuations be handled in a phase before or after parsing RTF? How should "\\\\\n\n" be parsed? * How to include a literal backslash? If escaped, in which phase of parsing? * Should official Microsoft RTF syntax rules be used for RTF control word tokenization and semantics? Which version(s) of RTF exactly? The rules on the Crosswire wiki page might differ from RTF specs. * The wiki page states that "using the actual UTF-8 character is preferred" to RTF "\u" escapes, but the RTF syntax only allows 7-bit ASCII characters. Does this mean that all UTF-8 characters should be converted to "\u"-style RTF escapes before handing off to the RTF parser? Since the "\u" escapes can only handle code points U+0000 to U+FFFF, how should other UTF-8 code points beyond U+FFFF be handled?

The original libsword implementation also seemed to suffer from various issues and was not of much help to me, thus I eventually ended up abandoning this effort.


On 16.04.24 10:20, domcox wrote:

Only a very small, restricted subset of RTF markup is supported, see:

"David \"Judah's Shadow\" Blue" <yudahssha...@gmx.com> writes:

I'm working on an info command to display some basic info about modules, and I ran into the fact that, at least in the About entry, the conf file can contain
RTF formatting. As it stands I strip out \pard, replace \par with \n, and
strip out the tag portion of any anchor/link tags found. My question is, are there any other tags that are likely to appear in conf entries that I should be either handling or stripping (since my front end does no formatting of text

sword-devel mailing list: sword-devel@crosswire.org
Instructions to unsubscribe/change your settings at above page

Reply via email to