Hello , I have tested the latest patch, Since I'm using Nutch 2.2.1, patch installation was not straight forward, I mean using, $patch < NUTCH-1478v6.patch
Probably it's straight forward with latest dev version on nutch so not to worry to much and I manage to installed the patch with few manual work and everything looks ok and parcechecker is also giving expected output. Anyway I came to the same possession where I got issues with Solr search. Probably meta tags are indexed on Solr but not searchable. I'm wondering do i need to use copyfield to copy metatags to text field in solr. What you think ? My other question is, on solr, schema.xml you specify dynamic name as meta_*, is that needs to be metatag_* ? Appreciate community support on this. Thanks, Shanaka On Wed, Mar 12, 2014 at 2:43 PM, Talat Uyarer <[email protected]> wrote: > Hey Shanaka, > > This patch based on lastest 2.x branch. You can download code of lastest > 2.x from github[1] Then you apply the patch. > > [1] https://github.com/apache/nutch/archive/2.x.zip > > > 2014-03-12 16:08 GMT+02:00 Shanaka Jayasundera <[email protected]>: > > Hi Talat, >> >> I am trying your new patch, do i need to still need to start with zip >> file or its sufficient to take latest patch ? >> >> Thanks, >> Shanaka >> >> >> On Wed, Mar 12, 2014 at 7:57 AM, Shanaka Jayasundera >> <[email protected]>wrote: >> >>> Hi Talat, >>> >>> Yes I add like following, >>> >>> <field name="meta_*" type="string" stored="true" indexed="true"/> >>> Also I tried giving the full name as field name. >>> >>> Thanks, >>> Shanaka >>> >>> >>> On Wed, Mar 12, 2014 at 7:52 AM, Talat Uyarer <[email protected]> wrote: >>> >>>> Hi Shanaka, >>>> >>>> Did you add meta field your schema of solr ? >>>> >>>> Talat >>>> >>>> >>>> 2014-03-12 13:25 GMT+02:00 Shanaka Jayasundera <[email protected]>: >>>> >>>> Hi Talat, >>>>> How patch work going on ? >>>>> Appreciate if you can help me, I am unable to proceed because meta >>>>> data is not getting indexed on solr. >>>>> >>>>> Thanks, >>>>> Shanaka >>>>> >>>>> >>>>> On Tue, Mar 11, 2014 at 11:21 AM, Shanaka Jayasundera < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Talat, >>>>>> >>>>>> Excellent news, Will you be able to prepare the patch file compatible >>>>>> with Nutch 2.2.1 ( Latest Version) ? >>>>>> I will try your new patch. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>>> On Tue, Mar 11, 2014 at 11:14 AM, Talat Uyarer <[email protected]>wrote: >>>>>> >>>>>>> Hi Shanaka, >>>>>>> >>>>>>> Yes. New patch is on the way. I hope I will send on the issue >>>>>>> tonight. I >>>>>>> clean unnesssary code blocks, rename methods, update solr schema >>>>>>> etc. :) >>>>>>> >>>>>>> Talat >>>>>>> >>>>>>> >>>>>>> 2014-03-11 16:47 GMT+02:00 Shanaka Jayasundera <[email protected]>: >>>>>>> >>>>>>> > Hi Talat, >>>>>>> > Thanks lot, I came this far because of your Patch and explanation. >>>>>>> I've >>>>>>> > used latest patch you have published on 28/Feb/14 09:59, You >>>>>>> meant to say >>>>>>> > new patch is on the way ? >>>>>>> > >>>>>>> > Thanks, Shanaka >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > On Tue, Mar 11, 2014 at 10:24 AM, Talat Uyarer <[email protected]> >>>>>>> wrote: >>>>>>> > >>>>>>> > > Hi Shanaka, >>>>>>> > > >>>>>>> > > I develop NUTCH-1478. It has some updates. If it will be >>>>>>> problem, I will >>>>>>> > > answer your questions after my update patch. Also you can review >>>>>>> my last >>>>>>> > > update :) >>>>>>> > > >>>>>>> > > Talat >>>>>>> > > >>>>>>> > > >>>>>>> > > 2014-03-11 14:27 GMT+02:00 Shanaka Jayasundera < >>>>>>> [email protected]>: >>>>>>> > > >>>>>>> > > > Hello , >>>>>>> > > > >>>>>>> > > > I have configure Nutch 2.2.1 following Nutch2Tutorial >>>>>>> > > > <https://wiki.apache.org/nutch/Nutch2Tutorial>and integrated >>>>>>> with Solr >>>>>>> > > 4.7 >>>>>>> > > > and it's working fine. Then I wanted to parse HTML and index >>>>>>> meta tags >>>>>>> > > in >>>>>>> > > > solr. >>>>>>> > > > Since Parse-metatags is not supported by default I follow >>>>>>> > "Parse-metatags >>>>>>> > > > and index-metadata plugin for Nutch 2.x >>>>>>> > > > series<https://issues.apache.org/jira/browse/NUTCH-1478>" and >>>>>>> > > > installed patchNUTCH-1478v5.patc.< >>>>>>> > > > >>>>>>> > > >>>>>>> > >>>>>>> https://issues.apache.org/jira/secure/attachment/12631702/NUTCH-1478v5.patch >>>>>>> > > > > >>>>>>> > > > >>>>>>> > > > I think I have install it correctly because i get following >>>>>>> out put >>>>>>> > when >>>>>>> > > I >>>>>>> > > > try to parch a URL >>>>>>> > > > >>>>>>> > > > $ ./bin/nutch parsechecker http://nutch.apache.org/ >>>>>>> > > > fetching: http://nutch.apache.org/ >>>>>>> > > > parsing: http://nutch.apache.org/ >>>>>>> > > > contentType: text/html >>>>>>> > > > signature: 030a8fe7684b5357663e041327e3d96b >>>>>>> > > > --------- >>>>>>> > > > Url >>>>>>> > > > --------------- >>>>>>> > > > http://nutch.apache.org/ >>>>>>> > > > --------- >>>>>>> > > > Metadata >>>>>>> > > > --------- >>>>>>> > > > metatag.forrest-skin-name : nutch >>>>>>> > > > metatag.forrest-version : 0.10-dev >>>>>>> > > > metatag.generator : Apache Forrest >>>>>>> > > > metatag.content-type : text/html; charset=UTF-8 >>>>>>> > > > >>>>>>> > > > Now I am Trying to index meta data along with other content to >>>>>>> Solr, I >>>>>>> > > have >>>>>>> > > > update solr schema.xml with <field name="meta_*" type="string" >>>>>>> > > > stored="true" indexed="true"/> to accept every generated >>>>>>> fields. >>>>>>> > > > >>>>>>> > > > My questing is how to >>>>>>> > > > 1. Index meta data in Solr ? When I execute ./bin/nutch >>>>>>> parsechecker >>>>>>> > > > http://nutch.apache.org/ it will extract and give the meta >>>>>>> tags on >>>>>>> > > > standard >>>>>>> > > > output, how to ask solr to index these metatags. >>>>>>> > > > 2. Is it possible to integrate with bit/crawl default script >>>>>>> with >>>>>>> > > > modifications >>>>>>> > > > bin/crawl urls/seed.txt TestCrawl1.3 >>>>>>> http://localhost:8983/solr/ 1 >>>>>>> > > > This will index sites content on solr but not the meta data >>>>>>> > > > >>>>>>> > > > Can any one please help me , Thanks in Advance. >>>>>>> > > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > -- >>>>>>> > > Talat UYARER >>>>>>> > > Websitesi: http://talat.uyarer.com >>>>>>> > > Twitter: http://twitter.com/talatuyarer >>>>>>> > > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >>>>>>> > > >>>>>>> > >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Talat UYARER >>>>>>> Websitesi: http://talat.uyarer.com >>>>>>> Twitter: http://twitter.com/talatuyarer >>>>>>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Talat UYARER >>>> Websitesi: http://talat.uyarer.com >>>> Twitter: http://twitter.com/talatuyarer >>>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >>>> >>> >>> >> > > > -- > Talat UYARER > Websitesi: http://talat.uyarer.com > Twitter: http://twitter.com/talatuyarer > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >

