Hey Shanaka, This patch based on lastest 2.x branch. You can download code of lastest 2.x from github[1] Then you apply the patch.
[1] https://github.com/apache/nutch/archive/2.x.zip 2014-03-12 16:08 GMT+02:00 Shanaka Jayasundera <[email protected]>: > Hi Talat, > > I am trying your new patch, do i need to still need to start with zip file > or its sufficient to take latest patch ? > > Thanks, > Shanaka > > > On Wed, Mar 12, 2014 at 7:57 AM, Shanaka Jayasundera > <[email protected]>wrote: > >> Hi Talat, >> >> Yes I add like following, >> >> <field name="meta_*" type="string" stored="true" indexed="true"/> >> Also I tried giving the full name as field name. >> >> Thanks, >> Shanaka >> >> >> On Wed, Mar 12, 2014 at 7:52 AM, Talat Uyarer <[email protected]> wrote: >> >>> Hi Shanaka, >>> >>> Did you add meta field your schema of solr ? >>> >>> Talat >>> >>> >>> 2014-03-12 13:25 GMT+02:00 Shanaka Jayasundera <[email protected]>: >>> >>> Hi Talat, >>>> How patch work going on ? >>>> Appreciate if you can help me, I am unable to proceed because meta data >>>> is not getting indexed on solr. >>>> >>>> Thanks, >>>> Shanaka >>>> >>>> >>>> On Tue, Mar 11, 2014 at 11:21 AM, Shanaka Jayasundera < >>>> [email protected]> wrote: >>>> >>>>> Hi Talat, >>>>> >>>>> Excellent news, Will you be able to prepare the patch file compatible >>>>> with Nutch 2.2.1 ( Latest Version) ? >>>>> I will try your new patch. >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> On Tue, Mar 11, 2014 at 11:14 AM, Talat Uyarer <[email protected]>wrote: >>>>> >>>>>> Hi Shanaka, >>>>>> >>>>>> Yes. New patch is on the way. I hope I will send on the issue >>>>>> tonight. I >>>>>> clean unnesssary code blocks, rename methods, update solr schema etc. >>>>>> :) >>>>>> >>>>>> Talat >>>>>> >>>>>> >>>>>> 2014-03-11 16:47 GMT+02:00 Shanaka Jayasundera <[email protected]>: >>>>>> >>>>>> > Hi Talat, >>>>>> > Thanks lot, I came this far because of your Patch and explanation. >>>>>> I've >>>>>> > used latest patch you have published on 28/Feb/14 09:59, You meant >>>>>> to say >>>>>> > new patch is on the way ? >>>>>> > >>>>>> > Thanks, Shanaka >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Tue, Mar 11, 2014 at 10:24 AM, Talat Uyarer <[email protected]> >>>>>> wrote: >>>>>> > >>>>>> > > Hi Shanaka, >>>>>> > > >>>>>> > > I develop NUTCH-1478. It has some updates. If it will be problem, >>>>>> I will >>>>>> > > answer your questions after my update patch. Also you can review >>>>>> my last >>>>>> > > update :) >>>>>> > > >>>>>> > > Talat >>>>>> > > >>>>>> > > >>>>>> > > 2014-03-11 14:27 GMT+02:00 Shanaka Jayasundera < >>>>>> [email protected]>: >>>>>> > > >>>>>> > > > Hello , >>>>>> > > > >>>>>> > > > I have configure Nutch 2.2.1 following Nutch2Tutorial >>>>>> > > > <https://wiki.apache.org/nutch/Nutch2Tutorial>and integrated >>>>>> with Solr >>>>>> > > 4.7 >>>>>> > > > and it's working fine. Then I wanted to parse HTML and index >>>>>> meta tags >>>>>> > > in >>>>>> > > > solr. >>>>>> > > > Since Parse-metatags is not supported by default I follow >>>>>> > "Parse-metatags >>>>>> > > > and index-metadata plugin for Nutch 2.x >>>>>> > > > series<https://issues.apache.org/jira/browse/NUTCH-1478>" and >>>>>> > > > installed patchNUTCH-1478v5.patc.< >>>>>> > > > >>>>>> > > >>>>>> > >>>>>> https://issues.apache.org/jira/secure/attachment/12631702/NUTCH-1478v5.patch >>>>>> > > > > >>>>>> > > > >>>>>> > > > I think I have install it correctly because i get following out >>>>>> put >>>>>> > when >>>>>> > > I >>>>>> > > > try to parch a URL >>>>>> > > > >>>>>> > > > $ ./bin/nutch parsechecker http://nutch.apache.org/ >>>>>> > > > fetching: http://nutch.apache.org/ >>>>>> > > > parsing: http://nutch.apache.org/ >>>>>> > > > contentType: text/html >>>>>> > > > signature: 030a8fe7684b5357663e041327e3d96b >>>>>> > > > --------- >>>>>> > > > Url >>>>>> > > > --------------- >>>>>> > > > http://nutch.apache.org/ >>>>>> > > > --------- >>>>>> > > > Metadata >>>>>> > > > --------- >>>>>> > > > metatag.forrest-skin-name : nutch >>>>>> > > > metatag.forrest-version : 0.10-dev >>>>>> > > > metatag.generator : Apache Forrest >>>>>> > > > metatag.content-type : text/html; charset=UTF-8 >>>>>> > > > >>>>>> > > > Now I am Trying to index meta data along with other content to >>>>>> Solr, I >>>>>> > > have >>>>>> > > > update solr schema.xml with <field name="meta_*" type="string" >>>>>> > > > stored="true" indexed="true"/> to accept every generated fields. >>>>>> > > > >>>>>> > > > My questing is how to >>>>>> > > > 1. Index meta data in Solr ? When I execute ./bin/nutch >>>>>> parsechecker >>>>>> > > > http://nutch.apache.org/ it will extract and give the meta >>>>>> tags on >>>>>> > > > standard >>>>>> > > > output, how to ask solr to index these metatags. >>>>>> > > > 2. Is it possible to integrate with bit/crawl default script >>>>>> with >>>>>> > > > modifications >>>>>> > > > bin/crawl urls/seed.txt TestCrawl1.3 >>>>>> http://localhost:8983/solr/ 1 >>>>>> > > > This will index sites content on solr but not the meta data >>>>>> > > > >>>>>> > > > Can any one please help me , Thanks in Advance. >>>>>> > > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > -- >>>>>> > > Talat UYARER >>>>>> > > Websitesi: http://talat.uyarer.com >>>>>> > > Twitter: http://twitter.com/talatuyarer >>>>>> > > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >>>>>> > > >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Talat UYARER >>>>>> Websitesi: http://talat.uyarer.com >>>>>> Twitter: http://twitter.com/talatuyarer >>>>>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Talat UYARER >>> Websitesi: http://talat.uyarer.com >>> Twitter: http://twitter.com/talatuyarer >>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >>> >> >> > -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

