Hi Shanaka, I develop NUTCH-1478. It has some updates. If it will be problem, I will answer your questions after my update patch. Also you can review my last update :)
Talat 2014-03-11 14:27 GMT+02:00 Shanaka Jayasundera <[email protected]>: > Hello , > > I have configure Nutch 2.2.1 following Nutch2Tutorial > <https://wiki.apache.org/nutch/Nutch2Tutorial>and integrated with Solr 4.7 > and it's working fine. Then I wanted to parse HTML and index meta tags in > solr. > Since Parse-metatags is not supported by default I follow "Parse-metatags > and index-metadata plugin for Nutch 2.x > series<https://issues.apache.org/jira/browse/NUTCH-1478>" and > installed patchNUTCH-1478v5.patc.< > https://issues.apache.org/jira/secure/attachment/12631702/NUTCH-1478v5.patch > > > > I think I have install it correctly because i get following out put when I > try to parch a URL > > $ ./bin/nutch parsechecker http://nutch.apache.org/ > fetching: http://nutch.apache.org/ > parsing: http://nutch.apache.org/ > contentType: text/html > signature: 030a8fe7684b5357663e041327e3d96b > --------- > Url > --------------- > http://nutch.apache.org/ > --------- > Metadata > --------- > metatag.forrest-skin-name : nutch > metatag.forrest-version : 0.10-dev > metatag.generator : Apache Forrest > metatag.content-type : text/html; charset=UTF-8 > > Now I am Trying to index meta data along with other content to Solr, I have > update solr schema.xml with <field name="meta_*" type="string" > stored="true" indexed="true"/> to accept every generated fields. > > My questing is how to > 1. Index meta data in Solr ? When I execute ./bin/nutch parsechecker > http://nutch.apache.org/ it will extract and give the meta tags on > standard > output, how to ask solr to index these metatags. > 2. Is it possible to integrate with bit/crawl default script with > modifications > bin/crawl urls/seed.txt TestCrawl1.3 http://localhost:8983/solr/ 1 > This will index sites content on solr but not the meta data > > Can any one please help me , Thanks in Advance. > -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

