Hi Shanaka, I uploaded new patch for NUtch-1478. If you have time. Can you review it ?
Thanks 2014-03-11 17:21 GMT+02:00 Shanaka Jayasundera <[email protected]>: > Hi Talat, > > Excellent news, Will you be able to prepare the patch file compatible with > Nutch 2.2.1 ( Latest Version) ? > I will try your new patch. > > Thanks, > > > On Tue, Mar 11, 2014 at 11:14 AM, Talat Uyarer <[email protected]> wrote: > > > Hi Shanaka, > > > > Yes. New patch is on the way. I hope I will send on the issue tonight. I > > clean unnesssary code blocks, rename methods, update solr schema etc. :) > > > > Talat > > > > > > 2014-03-11 16:47 GMT+02:00 Shanaka Jayasundera <[email protected]>: > > > > > Hi Talat, > > > Thanks lot, I came this far because of your Patch and explanation. > I've > > > used latest patch you have published on 28/Feb/14 09:59, You meant to > > say > > > new patch is on the way ? > > > > > > Thanks, Shanaka > > > > > > > > > > > > On Tue, Mar 11, 2014 at 10:24 AM, Talat Uyarer <[email protected]> > wrote: > > > > > > > Hi Shanaka, > > > > > > > > I develop NUTCH-1478. It has some updates. If it will be problem, I > > will > > > > answer your questions after my update patch. Also you can review my > > last > > > > update :) > > > > > > > > Talat > > > > > > > > > > > > 2014-03-11 14:27 GMT+02:00 Shanaka Jayasundera <[email protected]>: > > > > > > > > > Hello , > > > > > > > > > > I have configure Nutch 2.2.1 following Nutch2Tutorial > > > > > <https://wiki.apache.org/nutch/Nutch2Tutorial>and integrated with > > Solr > > > > 4.7 > > > > > and it's working fine. Then I wanted to parse HTML and index meta > > tags > > > > in > > > > > solr. > > > > > Since Parse-metatags is not supported by default I follow > > > "Parse-metatags > > > > > and index-metadata plugin for Nutch 2.x > > > > > series<https://issues.apache.org/jira/browse/NUTCH-1478>" and > > > > > installed patchNUTCH-1478v5.patc.< > > > > > > > > > > > > > > > https://issues.apache.org/jira/secure/attachment/12631702/NUTCH-1478v5.patch > > > > > > > > > > > > > > > > I think I have install it correctly because i get following out put > > > when > > > > I > > > > > try to parch a URL > > > > > > > > > > $ ./bin/nutch parsechecker http://nutch.apache.org/ > > > > > fetching: http://nutch.apache.org/ > > > > > parsing: http://nutch.apache.org/ > > > > > contentType: text/html > > > > > signature: 030a8fe7684b5357663e041327e3d96b > > > > > --------- > > > > > Url > > > > > --------------- > > > > > http://nutch.apache.org/ > > > > > --------- > > > > > Metadata > > > > > --------- > > > > > metatag.forrest-skin-name : nutch > > > > > metatag.forrest-version : 0.10-dev > > > > > metatag.generator : Apache Forrest > > > > > metatag.content-type : text/html; charset=UTF-8 > > > > > > > > > > Now I am Trying to index meta data along with other content to > Solr, > > I > > > > have > > > > > update solr schema.xml with <field name="meta_*" type="string" > > > > > stored="true" indexed="true"/> to accept every generated fields. > > > > > > > > > > My questing is how to > > > > > 1. Index meta data in Solr ? When I execute ./bin/nutch > parsechecker > > > > > http://nutch.apache.org/ it will extract and give the meta tags on > > > > > standard > > > > > output, how to ask solr to index these metatags. > > > > > 2. Is it possible to integrate with bit/crawl default script with > > > > > modifications > > > > > bin/crawl urls/seed.txt TestCrawl1.3 > http://localhost:8983/solr/1 > > > > > This will index sites content on solr but not the meta data > > > > > > > > > > Can any one please help me , Thanks in Advance. > > > > > > > > > > > > > > > > > > > > > -- > > > > Talat UYARER > > > > Websitesi: http://talat.uyarer.com > > > > Twitter: http://twitter.com/talatuyarer > > > > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 > > > > > > > > > > > > > > > -- > > Talat UYARER > > Websitesi: http://talat.uyarer.com > > Twitter: http://twitter.com/talatuyarer > > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 > > > -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

