Re: How to fix searching

2016-10-13 Thread Mads Kiilerich

On 09/26/2016 06:55 PM, Dominik Ruf wrote:
2. searching in multiple repositories (inlc. fulltext searching in the 
files).


I haven't really used whoosh. I think it is a nice simple feature that 
should be able to work fine in some setups but I would not expect it to 
be usable on a huge and branched repository as the Kallithea 
installation we have. I have thus disabled it in our setup.


In my Kallithea setup, I think I would prefer a separate indexing 
service and configure it to only run on few of our branches and make 
sure the indexing is tailored to our code.


We should however get the integrated whoosh working and make sure we are 
hitting real limits before we jump to conclusions.


I was not aware I had broken whoosh. What problem do you see?

/Mads
___
kallithea-general mailing list
kallithea-general@sfconservancy.org
http://lists.sfconservancy.org/mailman/listinfo/kallithea-general


Re: How to fix searching

2016-09-27 Thread Thomas De Schampheleire
Hi Dominik,

On Mon, Sep 26, 2016 at 6:55 PM, Dominik Ruf  wrote:
> Hi,
>
> there are basically 2 different kinds of searches in kallithea.
>
> 1. filtering revisions
> Mads mentioned 2 years ago that he plans to add some support for this
> https://bitbucket.org/conservancy/kallithea/issues/18/search-needs-to-be-improved
> 2. searching in multiple repositories (inlc. fulltext searching in the
> files)
>
> I think the first point is pretty much strait forward. Git and Mercurial
> support filtering revisions. It basically 'only' needs to be implemented.
> :-)
>
> But the second one is more complicated.
> There are multiple problems with the current implementation.
>
> 1. For starters since 9c5f794df7cd the make-index command is broken. But
> that can be easily fixed.
> 2. What is no so easy to fix, is the fact that indexing is currently
> incredibly slow.
> 3. The indexing is done periodically, it only indexes the tip revision at
> indexing time and the search results refer to the tip at search time.
> Therefore
>   a) you may get hits that are no longer valid
>   b) you may get no hits even though the string is present now
>   c) you can't search for things that have been removed
>
> I believe all this is solvable. I looked into the code and found a few
> places where the indexing can definitely be improve.
> But I don't have much experience with whoosh. So I'm not sure if it is even
> worth it to fix the current implementation, or if I should restart with solr
> or elastic search.
>
> My questions to you guys are:
>
> 1. Do you have experience with whoosh? Does it scale to gigabytes of data?
> 2. Would you even pull a implementation that requires installing solr? Note:
> I believe installation and setup of solr can be automated.
> 3. Or maybe you thing the fulltext search should be dropped all together.
>

I personally think that 'fulltext search' on repositories which are
typically containing source code, has relatively little value.
Fulltext search like whoosh or solr are providing are not aware of the
structure of source code, and thus have no advanced capabilities to
search only in identifiers, or click through on symbols in the search
result. Real code browsers, like OpenGrok or LXR, do have such
features.
The few times that I actually use fulltext search on e.g. GitHub is
when I'm too lazy to actually clone the repo and use a grep-like tool
to find it myself. It definitely has some value, but not so much.

With this in mind, I actually think there is much more value in fixing
the first type of search you highlight, i.e. filtering revisions.
Therefore, in my opinion we should prioritize 'just implementing' that
before looking at fulltext search.

Coming back to fulltext search:
- I have no specific experience with whoosh
- Regardless of the tool we'd use (whoosh, solr, ...), I think it
should always be optional. Kallithea should be installable without
search capabilities.
- It may be more useful to implement a flexible way where Kallithea
allows searching, but that the backend is customizable. I.e. the
search term can either be passed to whoosh, solr, or any other tool
that the user wants to configure. The tool would get the search term
and probably some other elements referring to the repo to search or
specific paths in the repo. Kallithea documentation can give some
examples on how to plug in known tools into this, but need not be
concerned with the entire gamma of tools available, nor choose one
specific one that may not scale to a particular use case. The same
could even be used to hook in code browsers like OpenGrok/LXR in the
search feature, rather than pure text search.

Best regards,
Thomas
___
kallithea-general mailing list
kallithea-general@sfconservancy.org
http://lists.sfconservancy.org/mailman/listinfo/kallithea-general