[sphinx-users] Re: Documentation about the built in search?

bradley...@gmail.com Sat, 16 Mar 2024 06:20:41 -0700

I have a sphinx wrapper  and wanted to configure the sphinx search to use 
all the words in the index but I could not figure out how to get sphinx to 
do this. ( For the sphinx wrapper, the words in the index are the words in 
the headings, titles, and page names.)


I used raw html  to get the search I wanted. For example,  here is the 
search for the sphinx wrapper
https://xrst.readthedocs.io/latest/xrst_search.html

It would be nice if there was a way to get the sphinx search to function 
like the example above ?

On Thursday, April 7, 2022 at 9:29:39 PM UTC-7 Charles Bouchard-Légaré 
wrote:

> I have also been looking into it lately and the only things I found was... 
> reading the source code.
>
> I was looking into trying to improve the search widget, trying to get 
> closer to what is available with MkDocs (modal results as you types, etc, 
> see screenshot below). 
>
> *Here is what I found about Sphinx' search:*
>
>    - The Javascript code for doing the actual search against the index is 
>    at sphinx/themes/basic/static/searchtools.js 
>    
> <https://github.com/sphinx-doc/sphinx/blob/5.x/sphinx/themes/basic/static/searchtools.js>
>    - The generation of the index is done at build time and the code is 
>    sphinx/search/__init__.py 
>    <https://github.com/sphinx-doc/sphinx/blob/5.x/sphinx/search/__init__.py>
>    - Sphinx does not only «full text search»
>       - First, look into «object» (like Python functions and such)
>          - The displayed result is built on the domain's display name and 
>          object type localized name
>          - No excerpt are provided here (which is kind of sad)
>          - Then, try full text search
>          - An excerpt built from text around the result is displayed
>          - I am no expert in full-text search, but it looks both simple 
>          and pretty standard, we more priority on terms from titles.  There 
> is a 
>          good stemmer for several languages.
>       - Sphinx' search clearly this is *not* an API
>       - Sphinx' search is not much configurable and does not seems to be 
>       part of a public API for users or extension developers to build on.
>       - When writing a new Domain, objects a provided by the get_objects 
>    method 
>    
> <https://www.sphinx-doc.org/en/master/extdev/domainapi.html?highlight=Domain#sphinx.domains.Domain.get_objects>
>  
>    (which must be provided by your implementation)
>       - It returns an iterable of «objects», a 6-tuple
>       - The last item priority determine how important an object is 
>       regarding search
>       - The URL built by the search result depend on the first, second 
>       and 5th item
>          - fullname «Fully qualified name.»
>          - dispname «Name to display when searching/linking.»
>          - anchor «The anchor name for the object.»
>       - In my custom Domain, search-generated URLs don't target the 
>       actual documented object. I still need to investigate how the Directive 
>       implementation, these three «object tuple» attribute and the search 
> work 
>       together. It seems to have still a few Python-specifics in there.
>    - As part of WebSupport, Sphinx provides a few utilities to enable 
>    server-side search 
>    
> <https://www.sphinx-doc.org/en/master/usage/advanced/websupport/searchadapters.html>.
>  
>    Personally, this is not interesting to me at the moment.
>    
> *For comparison, here is what I found about MkDocs*
>
>    - Unless specific plugins, only full text search is done.
>    - It uses lunr.js <https://lunrjs.com/>
>       - The documents MkDocs registers to lunr.js are, from my 
>       understanding
>          - All pages
>          - All sub sections, recursively
>          - Which means some text is added multiple times.  I suspect 
>          subsection are prioritized in the results
>          - Each item provides two "fields": title and text, somewhat like 
>          Sphinx.
>       - By default, they used to use lunr.py 
>       <https://github.com/yeraydiazdiaz/lunr.py> for pregenerating the 
>       index.  This pregeneration is configurable.
>       - This is deprecated now because lunr.py has binary transitive 
>          dependencies for non-english languages and this makes MkDocs harder 
> to use 
>          for Alpine Docker image users.
>          - They offer now to subprocess lunr.js with Nodejs
>          - The index can also be generated by Web workers 
>          
> *Other info I found:*
>
>    - ReadTheDocs has quite interesting search features 
>    <https://docs.readthedocs.io/en/stable/guides/advanced-search.html>
>    - Someone did made a lunr.js extension 
>    <https://github.com/rmcgibbo/sphinxcontrib-lunrsearch> for Sphinx, but 
>    only indexing "objects" in a separate custom search widget. Not actively 
>    maintained.
>    - I've looked into trying lunr in Sphinx for fulltext.  Building an 
>    Index would be quite simple with a EnvironmentCollector, but leveraging 
>    incremental builds would not yield all the optimization one could want 
>    because lunr dropped editable indices.  Here is a *not tested* stub 
>    that would still need to be integrated with Sphinx's APIs. to give an idea
>       - class Search:
>           def __init__(self, env: BuildEnvironment):
>               self._env = env
>               self._builder = get_default_builder()
>               self._builder.ref("id")
>               self._builder.field("title")
>               self._builder.field("text")
>       
>           def index_document(self, node: document):
>               self._builder.add(self.extract_search_document(node, 
>       section=False))
>               found = node.findall(section)
>               for element in found:
>                   self._builder.add(
>                       {
>                           self.extract_search_document(element)
>                       }
>                   )
>       
>           def extract_search_document(self, node: Node, section=True):
>               title_node = next(node.findall(title))
>               if section:
>                   anchor = title_node["ids"][0]
>                   uri = 
>       self._env.app.builder.get_target_uri(self._env.docname) + "#" + anchor
>               else:
>                   uri = 
>       self._env.app.builder.get_target_uri(self._env.docname)
>       
>               return {
>                   "id": uri,
>                   "title": title_node.astext(),
>                   "text": node.astext()
>               }
>    - All in all, I am not sure it is worth it to invest much on an 3rd 
>    party search engine such as lunr. I cannot yet prove that would 
>    provide much an improvement over Sphinx' search.  Adding such dependencies 
>    to Sphinx would probably not be acceptable anyway  Even as a separate 
>    extension, I don't clearly see an improvement here
>    - I see a lot of improvement that can be done by themes. I am not sure 
>    whether Sphinx' client-side search javascript code could be used for 
>    queries «as you type» efficiently, but having an overlay or modal result 
>    display would be great in my opinion. Sadly, my Python skills are quite 
>    good, I can play with JS a bit, but web development is something I never 
>    invested time or focus on.  Thus working on this would require tens of 
>    hours of unpleasantness, which is quite daunting I must admit.
>
> All-in-all, I would really like to help improve the search experience with 
> Sphinx, especially on static websites outside of ReadTheDocs. I feel that 
> the best early improvements to be done have to be in themes (improve the 
> UI) and this is something I don't feel I can help much with.  I would 
> gladly team up with anybody with Webdev skills to do something about it!
>
> *MkDocs Search Screenshot*
> [image: pydantic-search.png]
> On Saturday, December 11, 2021 at 3:30:47 PM UTC-5 martin...@gmail.com 
> wrote:
>
>> Hello friends, 
>>
>> is anybody aware of an article, blog post or other documentation that 
>> helps understand how the built in search works? 
>>
>> Regards, Martin 
>>
>> -- 
>>
>> See our Sphinx made docs at https://docs.typo3.org/ 
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"sphinx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sphinx-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sphinx-users/e8e7988d-0b18-4600-929b-346e15fecd36n%40googlegroups.com.

[sphinx-users] Re: Documentation about the built in search?

Reply via email to