[sword-devel] USFM2OSIS

2013-12-10 Thread Mike Hart
FYI -- item came up on another mailing list.  It appears that recently USFM tagging completely ignores the return character in many places, and validates only on the start of another tag.  That is, USFM2OSIS apparently considers something like (regex) \\id (...)(.+$)  to be the ID field; 

Re: [sword-devel] USFM2OSIS

2013-12-10 Thread Kahunapule Michael Johnson
Title: signature For what it is worth, I just looked through the USFM standard again, and there is nothing there about requiring or forbidding a newline as part of markup. What Paratext actually does is to regard newline and space as equivalent characters most of

Re: [sword-devel] USFM2OSIS

2013-12-10 Thread Robert Hunt
Title: signature That's correct. Mike's regular _expression_ will work more reliably for any USFM metadata and paragraph style markers that don't allow character formatting within the field. But maybe that's actually ONLY the metadata fields like id, ide, h, toc fields, etc.

Re: [sword-devel] USFM2OSIS

2013-12-10 Thread Chris Little
Per the USFM Reference, in the specific cases of \id and a very small number of other tags like \rem, the tag does end with the newline and it is not appropriate to interpret any following text as a new tag. It's not that usfm2osis.py requires \mt1, \c, or \q1 to start on a new line, it's that

Re: [sword-devel] USFM2OSIS

2013-12-10 Thread Kahunapule Michael Johnson
Unfortunately, the USFM reference is ambiguous on this point. Although the examples for \id and \rem both show information following the marker all on one line, but there is no where in the standard that I can find that says that either marker's data is terminated by a newline. This is actually