Dear SWORD developers,
From: L.Allan-pbio
...
I can think of several reasons for rawtext (non-compressed):
...
2. Search speed can be significantly faster. ...
That may be true for zText. However, other compression formats are faster to search than plain text.
3. It is easier to debug/examine a module. You can use a text editor ...
I think this is the overwhelming reason in favor of plain text. http://c2.com/cgi/wiki?PowerOfPlainText has convinced me to stick with plain text format (and plain-text-like formats, such as HTML) if at all possible.
From: L.Allan-pbio
...
I defined a sourceforge project BibleDb that would be optimized for Bible decompression/decryption/search speed (not necessarily for compression ratio).
...
BibleDB is only in pre-alpha stage. http://sourceforge.net/project/admin/?group_id=117234
Interesting. I will look at this soon. Perhaps we can apply some of the ideas from this article: "Compression: A Key for Next-Generation Text Retrieval Systems" by Nivio Ziviani, Edleno Silva de Moura, Gonzalo Navarro, and Ricardo Baeza-Yates in _Computer_ magazine November 2000 Their decompressor takes 1, 2, or 3 whole bytes of compressed data and decompresses (using a vocabulary list) into a whole word. This makes many kinds of searches *much* faster. One can directly search the compressed text for words or phrases, which turns out to be faster than searching uncompressed text. (Rather than *uncompressing* the entire Bible, and comparing the uncompressed Bible to the search string, we can *compress* just the search string, then compare the compressed Bible directly to the compressed search string). The article also has lots of other ideas about compressing indexes and approximate-match searching.
From: L.Allan-pbio My limited experience is that if you don't have a large block of data (book), then the compression ratio isn't very good.
That's very true. But I hope you can see that: * Ziviani's technique *does* have a large block of data, so potentially the compression ratio can be good. To give the best compression, the compressor scans the entire Bible (in order to pick out the most-common words and give them one-byte representations). * Ziviani's technique lets you point to any word in the text with a normal (byte) pointer and start decompressing immediately from that point. The decompressor can decompress a single verse -- it doesn't need to start at the first verse. (The decompressor needs more information than just the compressed version of the verse -- it also needs the global wordlist generated by the compressor). I am interested in other ways of decompressing just a verse or so, without needing to decompress everything from the beginning (and which still gives adequate compression). -- David Cary http://theconnexion.net/compass/index.php/User:DavidCary http://groups.google.com/groups/search?q=%22Compressing+the+Bible+for+a+PDA%22 _______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
