Re: [sword-devel] HowTo: create ztext module?

David Cary Thu, 25 May 2006 14:05:09 -0700

Dear SWORD developers,

From: L.Allan-pbio

...

I can think of several reasons for rawtext (non-compressed):

...

2. Search speed can be significantly faster. ...


That may be true for zText. However, other compression formats are
faster to search than plain text.

3. It is easier to debug/examine a module. You can use a text editor ...


I think this is the overwhelming reason in favor of plain text.
http://c2.com/cgi/wiki?PowerOfPlainText
has convinced me to stick with plain text format (and plain-text-like
formats, such as HTML) if at all possible.

From: L.Allan-pbio

...

I defined a sourceforge project BibleDb that would
be optimized for Bible decompression/decryption/search speed (not
necessarily for compression ratio).

...

BibleDB is only in pre-alpha stage.
http://sourceforge.net/project/admin/?group_id=117234


Interesting. I will look at this soon.

Perhaps we can apply some of the ideas from this article:

"Compression: A Key for Next-Generation Text Retrieval Systems"
by Nivio Ziviani, Edleno Silva de Moura, Gonzalo Navarro, and Ricardo
Baeza-Yates
in
_Computer_ magazine November 2000

Their decompressor takes 1, 2, or 3 whole bytes of compressed data
and decompresses (using a vocabulary list) into a whole word. This
makes many kinds of searches *much* faster. One can directly search the
compressed text for words or phrases, which turns out to be faster
than searching uncompressed text.

(Rather than *uncompressing* the entire Bible, and comparing the
uncompressed Bible to the search string, we can *compress* just the
search string, then compare the compressed Bible directly to the
compressed search string).

The article also has lots of other ideas about compressing indexes and
approximate-match searching.

From: L.Allan-pbio
My limited experience is that if you don't have a large block of data
(book), then the compression ratio isn't very good.


That's very true. But I hope you can see that:
* Ziviani's technique *does* have a large block of data, so
potentially the compression ratio can be good. To give the best
compression, the compressor scans the entire Bible (in order to pick
out the most-common words and give them one-byte representations).
* Ziviani's technique lets you point to any word in the text with a
normal (byte) pointer and start decompressing immediately from that
point. The decompressor can decompress a single verse -- it doesn't
need to start at the first verse. (The decompressor needs more
information than just the compressed version of the verse -- it also
needs the global wordlist generated by the compressor).

I am interested in other ways of decompressing just a verse or so,
without needing to decompress everything from the beginning (and which
still gives adequate compression).

--
David Cary
http://theconnexion.net/compass/index.php/User:DavidCary
http://groups.google.com/groups/search?q=%22Compressing+the+Bible+for+a+PDA%22

_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] HowTo: create ztext module?

Reply via email to