looks like this is NOT in fact working.. 

i have a webpage that has this 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";> 
<html xmlns="http://www.w3.org/1999/xhtml";> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 
<title>Asset Control and Behavior Branch</title> 
<meta name="keywords" content="Computational and Information Sciences, CISD, 
Tokarcik, research, data fusion, knowledge management, battlespace weather, 
environmental effects, computational science and engineering, battlefield 
communications and networks "> 
<meta name="description" content="This page explains the CISD mission and hosts 
the biographies of the CISD Director and Deputy Director."> 


This in nutch-site.xml 
parse-(html|tika|metatags) 

the page is.. 
https://snip/inside/directorates/cisd/asset.cfm 

solr schema.xml is 
<fieldType name="text_general" class="solr.TextField" 
positionIncrementGap="100"> 
<analyzer type="index"> 
<tokenizer class="solr.StandardTokenizerFactory" /> 
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" 
/> 
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="false" /> 
<filter class="solr.LowerCaseFilterFactory" /> 
</analyzer> 
<analyzer type="query"> 
<tokenizer class="solr.StandardTokenizerFactory" /> 
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" 
/> 
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true" /> 
<filter class="solr.LowerCaseFilterFactory" /> 
</analyzer> 
</fieldType> 

with 

<field name="metatag.description" type="text_general" stored="true" 
indexed="true" default="none" /> 
<field name="metatag.keywords" type="text_general" stored="true" indexed="true" 
default="none" /> 
<field name="metatag.date" type="text_general" stored="true" indexed="true" 
default="none" /> 
and the solr result is 
" title ": "Asset Control and Behavior Branch" , " metatag.date ": "none" , " 
metatag.description ": "none" , " metatag.keywords ": "none" 

Kris 
----- Original Message -----

From: "KRIS MUSSHORN" <[email protected]> 
To: [email protected] 
Sent: Wednesday, September 7, 2016 9:24:36 AM 
Subject: indexing metatags with Nutch 1.12 

Looks like its working correctly this morning using protocol-http and 
metatags... 
i didnt do anything to cause it to work... 


Reply via email to