Looks like this is NOT in fact working. 

How do I get the metatags into Solr? 

i have a webpage @ https://snip/inside/directorates/cisd/asset.cfm that has 
this in source: 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";> 
<html xmlns="http://www.w3.org/1999/xhtml";> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 
<title>Asset Control and Behavior Branch</title> 
<meta name="keywords" content="Computational and Information Sciences, CISD, 
Tokarcik, research, data fusion, knowledge management, battlespace weather, 
environmental effects, computational science and engineering, battlefield 
communications and networks "> 
<meta name="description" content="This page explains the CISD mission and hosts 
the biographies of the CISD Director and Deputy Director."> 

The parse metatags plugin is setup in nutch-site.xml as 
parse-(html|tika|metatags) 

Solr schema.xml is correctly set up to receive the metatags: 
<fieldType name="text_general" class="solr.TextField" 
positionIncrementGap="100"> 
<analyzer type="index"> 
<tokenizer class="solr.StandardTokenizerFactory" /> 
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" 
/> 
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="false" /> 
<filter class="solr.LowerCaseFilterFactory" /> 
</analyzer> 
<analyzer type="query"> 
<tokenizer class="solr.StandardTokenizerFactory" /> 
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" 
/> 
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true" /> 
<filter class="solr.LowerCaseFilterFactory" /> 
</analyzer> 
</fieldType> 

<field name="metatag.description" type="text_general" stored="true" 
indexed="true" default="none" /> 
<field name="metatag.keywords" type="text_general" stored="true" indexed="true" 
default="none" /> 
<field name="metatag.date" type="text_general" stored="true" indexed="true" 
default="none" /> 

After indexing the document solr shows: 
" title ": "Asset Control and Behavior Branch" , 
" metatag.date ": "none" , 
" metatag.description ": "none" , 
" metatag.keywords ": "none" 

How do I get solr result of: 
" title ": "Asset Control and Behavior Branch" , 
" metatag.date ": "none" , 
" metatag.description ": "This page explains the CISD mission and hosts the 
biographies of the CISD Director and Deputy Director." , 
" metatag.keywords ": "Computational and Information Sciences, CISD, Tokarcik, 
research, data fusion, knowledge management, battlespace weather, environmental 
effects, computational science and engineering, battlefield communications and 
networks" 

Kris 

Reply via email to