I had a similar problem once.. it was some stupid synrtax thing, lemme
check my setup....

On Fri, Sep 9, 2016 at 2:46 PM, KRIS MUSSHORN <[email protected]> wrote:

> Looks like this is NOT in fact working.
>
> How do I get the metatags into Solr?
>
> i have a webpage @ https://snip/inside/directorates/cisd/asset.cfm that
> has this in source:
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "
> http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
> <html xmlns="http://www.w3.org/1999/xhtml";>
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
> <title>Asset Control and Behavior Branch</title>
> <meta name="keywords" content="Computational and Information Sciences,
> CISD, Tokarcik, research, data fusion, knowledge management, battlespace
> weather, environmental effects, computational science and engineering,
> battlefield communications and networks ">
> <meta name="description" content="This page explains the CISD mission and
> hosts the biographies of the CISD Director and Deputy Director.">
>
> The parse metatags plugin is setup in nutch-site.xml as
> parse-(html|tika|metatags)
>
> Solr schema.xml is correctly set up to receive the metatags:
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory" />
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="false" />
> <filter class="solr.LowerCaseFilterFactory" />
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory" />
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true" />
> <filter class="solr.LowerCaseFilterFactory" />
> </analyzer>
> </fieldType>
>
> <field name="metatag.description" type="text_general" stored="true"
> indexed="true" default="none" />
> <field name="metatag.keywords" type="text_general" stored="true"
> indexed="true" default="none" />
> <field name="metatag.date" type="text_general" stored="true"
> indexed="true" default="none" />
>
> After indexing the document solr shows:
> " title ": "Asset Control and Behavior Branch" ,
> " metatag.date ": "none" ,
> " metatag.description ": "none" ,
> " metatag.keywords ": "none"
>
> How do I get solr result of:
> " title ": "Asset Control and Behavior Branch" ,
> " metatag.date ": "none" ,
> " metatag.description ": "This page explains the CISD mission and hosts
> the biographies of the CISD Director and Deputy Director." ,
> " metatag.keywords ": "Computational and Information Sciences, CISD,
> Tokarcik, research, data fusion, knowledge management, battlespace weather,
> environmental effects, computational science and engineering, battlefield
> communications and networks"
>
> Kris
>

Reply via email to