Here's a chunk of perl that can do what you want. (Sorry, I haven't
actually touched awk in about 15 years.) It takes the filename of your
chosen Tregelles edition as its argument and writes to tregelles.imp.
If you can use OSIS, I have a script for that, too. Or I could send an
OSIS doc that incorporates a bunch of corrections to the available text.
--Chris
#!/usr/bin/perl
open TREG, "<:utf8", @ARGV[0];
open TOUT, ">:utf8", "tregelles.imp";
while (<TREG>) {
$blob .= $_;
}
$blob =~ s/(<Page[^>]+><Title[^>]+>)\s*(\$\$\$[^\n]+\n)(<SB>)/$3\n$2$1\n/sg;
$blob =~ s/(\n\$\$\$[^\n]+\n)(<SB>)/$2\n$1/sg;
print TOUT $blob;
Troy A. Griffitts wrote:
Hey guys,
I'm trying to setup a reproducible process to convert the Tregelles data
to a SWORD imp data time. I have a series of sed commands to replace
tags, etc.... (yeah, yeah. I'm sure perl could do it in one line...)
But anyway, I've got one problem left that I could use some help with:
Here is a worst case real sample pattern:
...υντελείας τοῦ αἰῶνος.
<Page = 119><Title = ΕΥΑΓΓΕΛΙΟΝ ΚΑΤΑ ΜΑΡΚΟΝ.>
$$$Mark.1.1
<SB>Ἀρχὴ τοῦ εὐαγγελίου Ἰησοῦ χριστοῦ υἱοῦ θεοῦ·
It needs to become:
...υντελείας τοῦ αἰῶνος.
<SB>
$$$Mark.1.1
<Page = 119><Title = ΕΥΑΓΓΕΛΙΟΝ ΚΑΤΑ ΜΑΡΚΟΝ.>
Ἀρχὴ τοῦ εὐαγγελίου Ἰησοῦ χριστοῦ υἱοῦ θεοῦ·
So, the rules in pros:
<SB> which start a new line must be moved to the end of the
non-zero-length line preceding the previous $$$
^<Page = [^>]*><Title = [^>]*>$ lines must be moved down just below the
next $$$ line.
Any help would be appreciated. Preferably with something like awk (I
don't think sed can work multiline can it?) I guess perl would be ok too :)
Thanks for any help,
-Troy.
_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page