On 03/01/2012 12:02 PM, David Instone-Brewer wrote:
I need to get hold of the tagged Chinese Bible texts in a readable form
because I'm trying to get some Chinese readers to check some issues
with the tagging.
Does anyone know how to uncompress the Crosswire nt.bzz and ot.bzz files?
Use mod2imp.
I tried renaming them as ZIP and GZIP etc but didn't get anywhere.
Is it a proprietary compression routine, or have I missed something
obvious?
David said it is proprietary. It is, but it is not secret. The poorly
commented code is readily available for personal study.
We use regular zip (or possibly lzss) on parts of the file and
concatenate the parts into the whole. Even if you figured out how to
split it into parts and uncompress it, the parts have no implicit order
and the verses in the parts also have no implicit order. Also, if the
module were fixed by appending corrected verses, it does not remove the
incorrect verse. You'd find both the old and the new in there. And you'd
not find any verse markers to help you figure out one verse from another.
Even if you had an uncompressed module, whose dat file is readable, the
order of the data is no indicator of the order of the text. And you'd
not find any verse markers.
The only way to work with the text is either to get the original from
the source (highly recommended) or use one of our export utilities. By
using the source you can work with the "owner" to feed back corrections,
which would ultimately get back to us.
Each module's conf gives information regarding the source of the text.
In Him,
DM
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page