Title: signature
For what it is worth, I just looked through the USFM standard again, and there is nothing there about requiring or forbidding a newline as part of markup. What Paratext actually does is to regard newline and space as equivalent characters most of the time, although on writing out USFM it usually (but apparently not always) starts metadata and paragraph style markers on a new line. Thus Mike's second regular _expression_, below, is more likely to work all of the time in reading USFM input.

On 12/10/2013 09:11 AM, Mike Hart wrote:
FYI -- item came up on another mailing list. 

It appears that recently USFM tagging completely ignores the return character in many places, and validates only on the start of another tag. 

That is, USFM2OSIS apparently considers something like (regex)

\\id (...)(.+$) 

to be the ID field; while ParaTExt USFM now considers something more like

\\id (...)([^\]+) 

to be the ID field. 
( \1 = machine readable Bible book ID for import, \2= Optional human readable text explaining what the file is.)  

Further discussion describes this 'ignore-return-trend' is appearing around other tags as well, with chapters starting without a return after the end of the last verse.... 


Robert Hunt wrote:  
To:
 Paratext Supporters ‎
 
Tuesday, December 10, 2013 3:43 AM
Dear all,

    With increasing pressure to get Bibles and even partial Bibles onto mobile devices these days, there is lots of interest in converting from Paratext/USFM files to other formats. Crosswire Bible Society have the Sword Project which has its own binary format for Bible modules which are readable by "front-ends" on many operating systems, including Windows, Linux, Android, etc. However, the current Crosswire usfm2osis.py converter chokes on the following:
\id 1TH My test version \mt2 The first letter of Paul to the
\mt1 Corinthians
\c 1
\s Paul introduces himself
\p
\v 1 Hi there, I'm Paul.
In reading the USFM spec, I can't find confirmation that markers like \mt2 MUST start on a new line. The closest that I can see is:
Most paragraph or poetic markers (like \p, \m, \q# etc.) can be followed immediately by
a verse number (\v) on a new line.
All examples, however, do show these (what I call "newline markers") on new lines.

However, I notice that the last few Paratext versions have a tendency to pop some markers and their text up onto the end of the previous line. I'm pretty sure that PT6 didn't do this. I don't think this is an intentional feature, but seems to be either a bug or some kind of weird side-effect. (It happens often enough that I don't think the user can be blamed for it, especially the way \c markers pop onto the previous line, but of course because Paratext usually displays by chapter, the user can't even see that without changing view mode.)

So anyway, I have a few questions:
  1. Do you agree that these types of markers (\mt2, \c, \q1) should/must start on a new line?
  2. If so, would it be good to make that clear in the USFM standard (or did I miss something)?
  3. Is having these markers pop up to the end of the previous line a known bug in Paratext?
  4. Is there any way in Paratext to automatically fix this in the USFM files?
  5. Does the Pathway code handle files like this better than the Crosswire converter?
Thanks,
Robert.


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


--
Your partner in electronic Bible publishing,

MICHAEL JOHNSON
1215 S KIHEI RD STE O # 728
KIHEI HI 96753-5225

USA

Verizon Wireless Mobile: +1 808-333-6921
Skype: kahunapule or +1 719-387-7238
eBible.org
MLJohnson.org
PacificBibles.org
PNGScriptures.org
TokPlesBaibel.org
VanuatuBibles.org

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to