On May 12, 2007, at 6:05 PM, Chris Little wrote: > > > DM Smith wrote: >> >> On May 12, 2007, at 4:15 PM, Chris Little wrote: >> >>> >>>> -Line-feed and tabulations are not considered as space: if you >>>> look at >>>> >>>> Genesis 1:2, it should be "Et l'Esprit de Dieu" and it is >>>> displayed as >>>> >>>> "Etl'Esprit de Dieu" (a space is missing). >>>> >>> >>> This looks like a problem with osis2mod, but the OSIS file itself >>> could >>> >>> use some whitespace cleanup. There is a lot of stray whitespace, >>> for >>> >>> example at ends of lines, before </p>. The problem in Genesis 1:2 >>> could >>> >>> be handled by deleting changing the linefeed + tab to a single >>> space. >>> >> >> I think this is rather a "feature". osis2mod is trimming "extraneous" >> whitespace. I think this was to handle input that is pretty. I'm in >> favor of retaining all whitespace. My opinion is that an osis >> document >> should be what is actually wanted. I've got some changes I need to >> make >> because of the NASB (osis2mod is not handling stuff between verses >> well). I can change this too if it is what people want. > > It should trim whitespace in favor of smaller, simpler files. But > here, > it sounds like \n and \t are being deleted rather than something like > s/[\s]+/ /. > > I'm surprised we're doing this, but I'm just judging by the reported > symptoms, rather than looking at the osis2mod code itself.
And I was going by memory. So shame on me. I just went and looked at the code. Osis2mod does not get rid of any "extraneous" whitespace, but it calls FileMgr::getLine, which trims whitespace from the beginning and the end of the line. I also think there is a bug in its handling of line endings, in that in some places it just checks for 13 and others just 10 and yet others both are looked for. From what I can determine FileMgr::getLine is called by swcofig, osis2mod and imp2gbs. I think this should be replace with a call to std::getline. This is used by imp2ld, imp2vs, and xml2gbs. (for completeness, it should be noted that vpl2mod defines its own readline, which reads one character at a time into a buffer.) _______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
