Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-14 Thread Ralf Junker
Scott Hess wrote: >>I am optimistic that the proper implementation will use even less than 50%: > >Indeed :-). Glad to read this ;-) >>I found that _not_ adding the original text turned out to be a great time >>saver. This makes sense if we know that the original text is about 4 times >>the

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Ralf Junker
Hello Scott, I was hoping that you would read my message, many thanks for your reply! >UPDATE and DELETE need to have the previous document text, because the >docids are embedded in the index, and there is no docid->term index >(or, put another way, the previous document text _is_ the

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Ralf Junker
Ion Silvestru wrote: >Just a question: did you eliminated stop-words in your tests? No, I did not eliminate any stop-words. The two test runs were equal except for the small changes in FTS 2. My stop words question was not intended for source code but for human language texts. Ralf

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Scott Hess
On 3/13/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Ion Silvestru <[EMAIL PROTECTED]> wrote: > To Ralf: > >As a side effect, the offsets() and snippet() functions stopped working, > >as they seem to rely on the presence of the full document text in the > >current implementation. > > Did you

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread drh
Ion Silvestru <[EMAIL PROTECTED]> wrote: > To Ralf: > > >As a side effect, the offsets() and snippet() functions stopped working, as > >they seem to rely on the presence of the full document text in the current > >implementation. > > Did you tested "phrase" searching on the index-only version,

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Ion Silvestru
To Ralf: >As a side effect, the offsets() and snippet() functions stopped working, as >they seem to rely on the presence of the full document text in the current >implementation. Did you tested "phrase" searching on the index-only version, didn't this kind of search rely on offsets()?

Re[2]: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Ion Silvestru
>Just a question: did you eliminated stop-words in your tests? Sorry, you specified that you indexed source code files, so no stop-words are applicable here. - To unsubscribe, send email to [EMAIL PROTECTED]

Re: [sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Ion Silvestru
Thank you. Just a question: did you eliminated stop-words in your tests? >Concluding: Given the great database size savings possible by separating full >text index from data storage, I wish that >developers would consider adding such an option to the SQLite FTS interface. If such an option

[sqlite] FTS: index only, no text storage - Was: [sqlite] FTS: Custom Tokenizer / Stop Words

2007-03-13 Thread Ralf Junker
>But what about: > >I am very interested to know if it would be possible to use an FTS indexing >module to store the inverted index only, but >not the document's text. This would safe disk space if the text to index is >stored on disk rather than inside the database. This is possible with just