Re: HTML sample.html not indexing in Solr 8.8

cratervoid Sun, 21 Feb 2021 14:07:55 -0800

Thanks Shawn, I copied the solrconfig.xml file from the gettingstarted
example on 7.7.3 installation to the 8.8.0 installation, restarted the
server and it now works. Comparing the two files it looks like as you said
this section was left out of the _default/solrconfig.xml file in version
8.8.0:


<requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.meta">ignored_</str>
      <str name="fmap.content">_text_</str>
    </lst>
  </requestHandler>

So those trying out the tutorial will need to add this section to get it to
work for sample.html.



On Sat, Feb 20, 2021 at 4:21 PM Shawn Heisey <apa...@elyograg.org> wrote:

> On 2/20/2021 3:58 PM, cratervoid wrote:
> > SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url:
> >
> http://localhost:8983/solr/gettingstarted/update/extract?resource.name=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html&literal.id=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html
>
> The problem here is that the solrconfig.xml in use by the index named
> "gettingstarted" does not define a handler at /update/extract.
>
> Typically a handler defined at that URL path will utilize the extracting
> request handler class.  This handler uses Tika (another Apache project)
> to extract usable data from rich text formats like PDF, HTML, etc.
>
>    <!-- Solr Cell Update Request Handler
>
>         http://wiki.apache.org/solr/ExtractingRequestHandler
>
>      -->
>    <requestHandler name="/update/extract"
>                    startup="lazy"
>                    class="solr.extraction.ExtractingRequestHandler" >
>      <lst name="defaults">
>        <str name="lowernames">true</str>
>        <str name="fmap.meta">ignored_</str>
>        <str name="fmap.content">_text_</str>
>      </lst>
>    </requestHandler>
>
> Note that using this handler will require adding some contrib jars to Solr.
>
> Tika can become very unstable because it deals with undocumented file
> formats, so we do not recommend using that handler in production.  If
> the functionality is important, Tika should be included in a program
> that's separate from Solr, so that if it crashes, it does not take Solr
> down with it.
>
> Thanks,
> Shawn
>

Re: HTML sample.html not indexing in Solr 8.8

Reply via email to