[fossil-users] deep digging fossil repo

2015-02-14 Thread Petr Ferdus
Hello,

Could be fossil FTS used to search for particular text/substring hidden 
anywhere 
in fossil repository?

Based on my quick tests it seems that
 * content of files removed from repository is not returned on search page 
   (removed by fossil rm) 
 * it is not possible to search for content over more branches in one go 
   (only one could be set in Document Branch, it does not use GLOB values)

Would it be possible to search for treasures anywhere in fossilized history of 
fossil 
repository assured nothing was overlooked?

Thanks

Peter
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] deep digging fossil repo

2015-02-14 Thread Richard Hipp
On 2/14/15, Petr Ferdus petr...@centrum.cz wrote:
 Could be fossil FTS used to search for particular text/substring hidden
 anywhere
 in fossil repository?


Yes.  The infrastructure is designed with that in mind, but the
implementation has not been done yet.  I wanted to collect more
experience with the current search before getting into
universal-content search.

One problem to consider is that a typical full-text index requires
about 20% of the space of the original document.  The Fossil
self-hosting repository currently holds 2.3GB of content.  To build a
full-text index on it all would increase the repository size from 53MB
to about 500MB.  About 90% of the repository would be devoted to the
full-text index.  That does not seem desirable.

My plan to work around that is to only index the differences between
successive check-ins.  In other words, instead of indexing the
complete text of every document, only index the changes.  That will
probably reduce the size of the index to be proportional to the size
of the repository.  And, it means when you do a search, you are only
going to get hits for the particular versions where those words
actually change - move into or out of the document or are in close
proximity to other words that do.

Of course, you cannot simply index *only* the changes.  Each edit
needs some context.  How much context to include in the index is an
open question.

-- 
D. Richard Hipp
d...@sqlite.org
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users