Thanks Karl. You were right about the problem being at the Solr side.
It's documented at http://wiki.apache.org/solr/ExtractingRequestHandler#Getting_Started_with_the_Solr_Example In ManifoldCF, I just added in the "Arguments" section of the Solr Output definition: uprefix=attr_ and fmap.content=attr_content All working now. Thanks. Arcadius. On 18 July 2012 05:31, Karl Wright <[email protected]> wrote: > Hi Arcadius, > > When you look at the ManifoldCF Simple History report after your > crawl, does the length of each document that is indexed appear to be > approximately correct? If it does, then it would be your Solr > configuration that you'd need to look at to figure out why only the > HEAD section is affecting the index. > > Thanks, > Karl > > > On Tue, Jul 17, 2012 at 9:15 PM, Arcadius Ahouansou > <[email protected]> wrote: > > Hello. > > > > I am new to ManifoldCF. > > > > I am able to crawl a page and index it into Solr. > > > > However, I have noticed that only the metadata in the HEAD tag are > indexed > > in Solr. > > > > > > My question is: How can I get the content in the HTML BODY tag into Solr? > > > > Thank you very much. > > > > Arcadius. > > >
