Hi Shanaka,

I develop NUTCH-1478. It has some updates. If it will be problem, I will
answer your questions after my update patch. Also you can review my last
update :)

Talat


2014-03-11 14:27 GMT+02:00 Shanaka Jayasundera <[email protected]>:

> Hello ,
>
> I have configure Nutch 2.2.1 following Nutch2Tutorial
> <https://wiki.apache.org/nutch/Nutch2Tutorial>and integrated with Solr 4.7
> and  it's working fine. Then I wanted to parse HTML and index meta tags in
> solr.
> Since Parse-metatags is not supported by default I follow "Parse-metatags
> and index-metadata plugin for Nutch 2.x
> series<https://issues.apache.org/jira/browse/NUTCH-1478>" and
> installed patchNUTCH-1478v5.patc.<
> https://issues.apache.org/jira/secure/attachment/12631702/NUTCH-1478v5.patch
> >
>
> I think I have install it correctly because i get following out put when I
> try to parch a URL
>
> $ ./bin/nutch parsechecker http://nutch.apache.org/
> fetching: http://nutch.apache.org/
> parsing: http://nutch.apache.org/
> contentType: text/html
> signature: 030a8fe7684b5357663e041327e3d96b
> ---------
> Url
> ---------------
> http://nutch.apache.org/
> ---------
> Metadata
> ---------
> metatag.forrest-skin-name :     nutch
> metatag.forrest-version :     0.10-dev
> metatag.generator :     Apache Forrest
> metatag.content-type :     text/html; charset=UTF-8
>
> Now I am Trying to index meta data along with other content to Solr, I have
> update solr schema.xml with <field name="meta_*" type="string"
> stored="true" indexed="true"/> to accept every generated fields.
>
> My questing is how to
> 1. Index meta data in Solr ? When I execute ./bin/nutch parsechecker
> http://nutch.apache.org/ it will extract and give the meta tags on
> standard
> output, how to ask solr to index these metatags.
> 2. Is it possible to integrate with bit/crawl default script with
> modifications
>     bin/crawl urls/seed.txt TestCrawl1.3 http://localhost:8983/solr/ 1
>     This will index sites content on solr but not the meta data
>
> Can any one please help me , Thanks in Advance.
>



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Reply via email to