Hi Troy, Yes - you're probably right about lack of a readily available tool for direct conversion.
Had I been tackling the task, I might have considered these steps: 1. Open each HTML file using MS Word, save each file as RTF. 2. Open each RTF file using WordPad, save again as RTF (smaller and simpler file structure). 3. Create & run a script to process the RTF tags for italics attribute and for red font colour. 4. Open the processed RTF files using WordPad, save as Unicode text (encoded as UTF-16 LE). 5. Use a suitable editor to open the Unicode text files and change encoding to UTF-8 (without BOM). After step 5 you'd have something similar to where you began converting plain text to OSIS, but with some ingenuity at step 3, you'd also have some elementary markup for italics and red letters that survives the complete loss of formating attributes at step 4. During my Go Bible activities, I've used this approach more times than I can recall. /The steepest part of the learning curve is getting used to the format of RTF files when viewed by an ordinary text editor/. After step 5, it's often simpler to do the next conversion to USFM, and then use usfm2osis.pl Best regards, David -- View this message in context: http://sword-dev.350566.n4.nabble.com/EMTV-text-source-URL-is-now-unrelated-tp3871411p3899264.html Sent from the SWORD Dev mailing list archive at Nabble.com. _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page