On 03/01/2012 12:02 PM, David Instone-Brewer wrote:
I need to get hold of the tagged Chinese Bible texts in a readable form
because I'm trying to get some Chinese readers to check some issues with the tagging.

Does anyone know how to uncompress the Crosswire nt.bzz and ot.bzz files?

Use mod2imp.

I tried renaming them as ZIP and GZIP etc but didn't get anywhere.
Is it a proprietary compression routine, or have I missed something obvious?

David said it is proprietary. It is, but it is not secret. The poorly commented code is readily available for personal study.

We use regular zip (or possibly lzss) on parts of the file and concatenate the parts into the whole. Even if you figured out how to split it into parts and uncompress it, the parts have no implicit order and the verses in the parts also have no implicit order. Also, if the module were fixed by appending corrected verses, it does not remove the incorrect verse. You'd find both the old and the new in there. And you'd not find any verse markers to help you figure out one verse from another.

Even if you had an uncompressed module, whose dat file is readable, the order of the data is no indicator of the order of the text. And you'd not find any verse markers.

The only way to work with the text is either to get the original from the source (highly recommended) or use one of our export utilities. By using the source you can work with the "owner" to feed back corrections, which would ultimately get back to us.

Each module's conf gives information regarding the source of the text.

In Him,
    DM

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to