Thanks Karl.

You were right about the problem being at the Solr side.

It's documented at
http://wiki.apache.org/solr/ExtractingRequestHandler#Getting_Started_with_the_Solr_Example



In ManifoldCF, I just added in the "Arguments" section of the Solr Output
definition:
uprefix=attr_  and  fmap.content=attr_content

All working now.

Thanks.

Arcadius.

On 18 July 2012 05:31, Karl Wright <[email protected]> wrote:

> Hi Arcadius,
>
> When you look at the ManifoldCF Simple History report after your
> crawl, does the length of each document that is indexed appear to be
> approximately correct?  If it does, then it would be your Solr
> configuration that you'd need to look at to figure out why only the
> HEAD section is affecting the index.
>
> Thanks,
> Karl
>
>
> On Tue, Jul 17, 2012 at 9:15 PM, Arcadius Ahouansou
> <[email protected]> wrote:
> > Hello.
> >
> > I am new to ManifoldCF.
> >
> > I am able to crawl a page and index it into Solr.
> >
> > However, I have noticed that only the metadata in the HEAD tag are
> indexed
> > in Solr.
> >
> >
> > My question is: How can I get the content in the HTML BODY tag into Solr?
> >
> > Thank you very much.
> >
> > Arcadius.
> >
>

Reply via email to