On 09/04/2012 07:51 PM, Pavel Hlavnicka wrote:
Dear all,
we are using sqlite FTS4 to build a fulltext index on data which
should not be available to the user without a decryption inside the
application. FTS4 matches perfectly - we can use either contentless
database or compress/uncompress parameters to encrypt the plain text
data.
My question is if the advanced user could be able to rebuild the plain
text data just from the full text index.
I did some experiments and it seems that using Fts4aux can give just a
list of tokens and any time the offsets() function is called it either
fails (for contentless index) or needs to read the plaintext using the
function defined in the uncompress function (what would mean the user
who can not decrypt the data can not use the offsets() function).
On the other hand the documentation on the index structure
(http://www.sqlite.org/fts3.html#section_9_4) says it keeps offsets
internally.
Questions:
1) Is it possible to obtain term offsets from contentless FTS4 table
(though possibly accessing the index binary format directly)
2) Is it possible to obtain term offsets from FTS4 table which defines
uncompress function without this function beging called?
The offsets stored in the full-text index are measured in tokens,
not bytes or characters. From earlier on the same webpage:
A list of term offsets, one for each occurrence of the term within
the document. A term offset indicates the number of tokens (words) that
occur before the term in question, not the number of characters or
bytes. For example, the term offset of the term "war" in the phrase
"Ancestral voices prophesying war!" is 3.
Dan.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users