I typed that 1st message out very quickly, and it is somewhat flawed. If anyone
uses this info for a permanent fix...The USFM manual does indicate the presence
of a space After the 3 letter code...
^\id (...) ([^\]+)
is a better interpretation of the spec.
The spec does not declare 3
FYI -- item came up on another mailing list.
It appears that recently USFM tagging completely ignores the return character
in many places, and validates only on the start of another tag.
That is, USFM2OSIS apparently considers something like (regex)
\\id (...)(.+$)
to be the ID field;
Title: signature
For what it is worth, I just looked
through the USFM standard again, and there is nothing there about
requiring or forbidding a newline as part of markup. What Paratext
actually does is to regard newline and space as equivalent
characters most of
Title: signature
That's correct. Mike's regular _expression_ will work more reliably
for any USFM metadata and paragraph style markers that don't allow
character formatting within the field. But maybe that's actually
ONLY the metadata fields like id, ide, h, toc fields, etc.
Per the USFM Reference, in the specific cases of \id and a very small
number of other tags like \rem, the tag does end with the newline and it
is not appropriate to interpret any following text as a new tag. It's
not that usfm2osis.py requires \mt1, \c, or \q1 to start on a new line,
it's that
Unfortunately, the USFM reference is ambiguous on this point. Although the
examples for \id and \rem both show information following the marker all on one
line, but there is no where in the standard that I can find that says that
either marker's data is terminated by a newline. This is actually