I don't have it at all On Sep 9, 2016 3:42 PM, "Musshorn, Kris T CTR USARMY RDECOM ARL (US)" < [email protected]> wrote:
> CLASSIFICATION: UNCLASSIFIED > > Are you suggesting I should remove the index.metadata property completely > or just supply no value? > > Thanks, > Kris > > ~~~~~~~~~~~~~~~~~~~~~~~~~~ > Kris T. Musshorn > FileMaker Developer - Contractor – Catapult Technology Inc. > US Army Research Lab > Aberdeen Proving Ground > Application Management & Development Branch > 410-278-7251 > [email protected] > ~~~~~~~~~~~~~~~~~~~~~~~~~~ > > -----Original Message----- > From: BlackIce [mailto:[email protected]] > Sent: Friday, September 09, 2016 9:31 AM > To: [email protected] > Subject: [Non-DoD Source] Re: indexing metatags with Nutch 1.12 > > All active links contained in this email were disabled. Please verify the > identity of the sender, and confirm the authenticity of all links contained > within the message prior to copying and pasting the address to a Web > browser. > > > > > ---- > > I had a similar problem, took me days to figure it out, I can't remember > what exactly was going on, but it was some sort of conflict between > parameters in site.xml. I think you need to leave this BLANK: > > <property> > <name> > index.metadata > </name> > <value> > description,keywords > </value> > </property> > > > My Set-up (Nutch 1.11): > > Nutch-stie.xml: > > <property> > <name>plugin.includes</name> > <value>nutch-extensionpoints|headings|language-identifier| > protocol-http|urlfilter-regex|parse-(html|tika|metatags)| > index-(basic|anchor|more|metadata)|indexer-solr| > scoring-opic|urlnormalizer-( > pass|regex|basic)</value> > > </property> > > <!-- index-metadata plugin properties --> > > <property> > <name>index.parse.md</name> > <value>metatag.description,metatag.keywords,h1,h2,h3,h4, > h5,h6,metatag.title</value> > > </property> > > > > <!-- parse-metatags plugin properties --> <property> > <name>metatags.names</name> > <value>description,keywords,title,h1,h2,h3,h4,h5,h6</value> > > </property> > > On Fri, Sep 9, 2016 at 3:00 PM, BlackIce <[email protected]> wrote: > > > I had a similar problem once.. it was some stupid synrtax thing, lemme > > check my setup.... > > > > On Fri, Sep 9, 2016 at 2:46 PM, KRIS MUSSHORN <[email protected]> > > wrote: > > > >> Looks like this is NOT in fact working. > >> > >> How do I get the metatags into Solr? > >> > >> i have a webpage @ > >> Caution-https://snip/inside/directorates/cisd/asset.cfm that has this > in source: > >> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " > >> Caution-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> > >> <html xmlns="Caution-http://www.w3.org/1999/xhtml"> > >> <head> > >> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> > >> <title>Asset Control and Behavior Branch</title> <meta > >> name="keywords" content="Computational and Information Sciences, > >> CISD, Tokarcik, research, data fusion, knowledge management, > >> battlespace weather, environmental effects, computational science and > >> engineering, battlefield communications and networks "> <meta > >> name="description" content="This page explains the CISD mission and > >> hosts the biographies of the CISD Director and Deputy Director."> > >> > >> The parse metatags plugin is setup in nutch-site.xml as > >> parse-(html|tika|metatags) > >> > >> Solr schema.xml is correctly set up to receive the metatags: > >> <fieldType name="text_general" class="solr.TextField" > >> positionIncrementGap="100"> > >> <analyzer type="index"> > >> <tokenizer class="solr.StandardTokenizerFactory" /> <filter > >> class="solr.StopFilterFactory" ignoreCase="true" > >> words="stopwords.txt" /> > >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > >> ignoreCase="true" expand="false" /> > >> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> <analyzer > >> type="query"> <tokenizer class="solr.StandardTokenizerFactory" /> > >> <filter class="solr.StopFilterFactory" ignoreCase="true" > >> words="stopwords.txt" /> > >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > >> ignoreCase="true" expand="true" /> > >> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> > >> </fieldType> > >> > >> <field name="metatag.description" type="text_general" stored="true" > >> indexed="true" default="none" /> > >> <field name="metatag.keywords" type="text_general" stored="true" > >> indexed="true" default="none" /> > >> <field name="metatag.date" type="text_general" stored="true" > >> indexed="true" default="none" /> > >> > >> After indexing the document solr shows: > >> " title ": "Asset Control and Behavior Branch" , " metatag.date ": > >> "none" , " metatag.description ": "none" , " metatag.keywords ": > >> "none" > >> > >> How do I get solr result of: > >> " title ": "Asset Control and Behavior Branch" , " metatag.date ": > >> "none" , " metatag.description ": "This page explains the CISD > >> mission and hosts the biographies of the CISD Director and Deputy > >> Director." , " metatag.keywords ": "Computational and Information > >> Sciences, CISD, Tokarcik, research, data fusion, knowledge > >> management, battlespace weather, environmental effects, computational > >> science and engineering, battlefield communications and networks" > >> > >> Kris > >> > > > > > > > CLASSIFICATION: UNCLASSIFIED >

