Hi, Is there a way to identify what Nutch is sending Solr to index ? Trying to debug and see issue is on Nutch side or Solr side.
Thanks, Shanaka On Thu, Mar 13, 2014 at 9:15 AM, Shanaka Jayasundera <[email protected]>wrote: > Hello , > > I have tested the latest patch, Since I'm using Nutch 2.2.1, patch > installation was not straight forward, > > I mean using, > $patch < NUTCH-1478v6.patch > > Probably it's straight forward with latest dev version on nutch so not to > worry to much and I manage to installed the patch with few manual work and > everything looks ok and parcechecker is also giving expected output. > > Anyway I came to the same possession where I got issues with Solr search. > Probably meta tags are indexed on Solr but not searchable. > I'm wondering do i need to use copyfield to copy metatags to text field in > solr. What you think ? > > My other question is, on solr, schema.xml you specify dynamic name as > meta_*, is that needs to be metatag_* ? > > Appreciate community support on this. > > Thanks, > Shanaka > > > On Wed, Mar 12, 2014 at 2:43 PM, Talat Uyarer <[email protected]> wrote: > >> Hey Shanaka, >> >> This patch based on lastest 2.x branch. You can download code of lastest >> 2.x from github[1] Then you apply the patch. >> >> [1] https://github.com/apache/nutch/archive/2.x.zip >> >> >> 2014-03-12 16:08 GMT+02:00 Shanaka Jayasundera <[email protected]>: >> >> Hi Talat, >>> >>> I am trying your new patch, do i need to still need to start with zip >>> file or its sufficient to take latest patch ? >>> >>> Thanks, >>> Shanaka >>> >>> >>> On Wed, Mar 12, 2014 at 7:57 AM, Shanaka Jayasundera <[email protected] >>> > wrote: >>> >>>> Hi Talat, >>>> >>>> Yes I add like following, >>>> >>>> <field name="meta_*" type="string" stored="true" indexed="true"/> >>>> Also I tried giving the full name as field name. >>>> >>>> Thanks, >>>> Shanaka >>>> >>>> >>>> On Wed, Mar 12, 2014 at 7:52 AM, Talat Uyarer <[email protected]> wrote: >>>> >>>>> Hi Shanaka, >>>>> >>>>> Did you add meta field your schema of solr ? >>>>> >>>>> Talat >>>>> >>>>> >>>>> 2014-03-12 13:25 GMT+02:00 Shanaka Jayasundera <[email protected]>: >>>>> >>>>> Hi Talat, >>>>>> How patch work going on ? >>>>>> Appreciate if you can help me, I am unable to proceed because meta >>>>>> data is not getting indexed on solr. >>>>>> >>>>>> Thanks, >>>>>> Shanaka >>>>>> >>>>>> >>>>>> On Tue, Mar 11, 2014 at 11:21 AM, Shanaka Jayasundera < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Talat, >>>>>>> >>>>>>> Excellent news, Will you be able to prepare the patch file >>>>>>> compatible with Nutch 2.2.1 ( Latest Version) ? >>>>>>> I will try your new patch. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> >>>>>>> On Tue, Mar 11, 2014 at 11:14 AM, Talat Uyarer <[email protected]>wrote: >>>>>>> >>>>>>>> Hi Shanaka, >>>>>>>> >>>>>>>> Yes. New patch is on the way. I hope I will send on the issue >>>>>>>> tonight. I >>>>>>>> clean unnesssary code blocks, rename methods, update solr schema >>>>>>>> etc. :) >>>>>>>> >>>>>>>> Talat >>>>>>>> >>>>>>>> >>>>>>>> 2014-03-11 16:47 GMT+02:00 Shanaka Jayasundera <[email protected] >>>>>>>> >: >>>>>>>> >>>>>>>> > Hi Talat, >>>>>>>> > Thanks lot, I came this far because of your Patch and >>>>>>>> explanation. I've >>>>>>>> > used latest patch you have published on 28/Feb/14 09:59, You >>>>>>>> meant to say >>>>>>>> > new patch is on the way ? >>>>>>>> > >>>>>>>> > Thanks, Shanaka >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > On Tue, Mar 11, 2014 at 10:24 AM, Talat Uyarer <[email protected]> >>>>>>>> wrote: >>>>>>>> > >>>>>>>> > > Hi Shanaka, >>>>>>>> > > >>>>>>>> > > I develop NUTCH-1478. It has some updates. If it will be >>>>>>>> problem, I will >>>>>>>> > > answer your questions after my update patch. Also you can >>>>>>>> review my last >>>>>>>> > > update :) >>>>>>>> > > >>>>>>>> > > Talat >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > 2014-03-11 14:27 GMT+02:00 Shanaka Jayasundera < >>>>>>>> [email protected]>: >>>>>>>> > > >>>>>>>> > > > Hello , >>>>>>>> > > > >>>>>>>> > > > I have configure Nutch 2.2.1 following Nutch2Tutorial >>>>>>>> > > > <https://wiki.apache.org/nutch/Nutch2Tutorial>and integrated >>>>>>>> with Solr >>>>>>>> > > 4.7 >>>>>>>> > > > and it's working fine. Then I wanted to parse HTML and index >>>>>>>> meta tags >>>>>>>> > > in >>>>>>>> > > > solr. >>>>>>>> > > > Since Parse-metatags is not supported by default I follow >>>>>>>> > "Parse-metatags >>>>>>>> > > > and index-metadata plugin for Nutch 2.x >>>>>>>> > > > series<https://issues.apache.org/jira/browse/NUTCH-1478>" and >>>>>>>> > > > installed patchNUTCH-1478v5.patc.< >>>>>>>> > > > >>>>>>>> > > >>>>>>>> > >>>>>>>> https://issues.apache.org/jira/secure/attachment/12631702/NUTCH-1478v5.patch >>>>>>>> > > > > >>>>>>>> > > > >>>>>>>> > > > I think I have install it correctly because i get following >>>>>>>> out put >>>>>>>> > when >>>>>>>> > > I >>>>>>>> > > > try to parch a URL >>>>>>>> > > > >>>>>>>> > > > $ ./bin/nutch parsechecker http://nutch.apache.org/ >>>>>>>> > > > fetching: http://nutch.apache.org/ >>>>>>>> > > > parsing: http://nutch.apache.org/ >>>>>>>> > > > contentType: text/html >>>>>>>> > > > signature: 030a8fe7684b5357663e041327e3d96b >>>>>>>> > > > --------- >>>>>>>> > > > Url >>>>>>>> > > > --------------- >>>>>>>> > > > http://nutch.apache.org/ >>>>>>>> > > > --------- >>>>>>>> > > > Metadata >>>>>>>> > > > --------- >>>>>>>> > > > metatag.forrest-skin-name : nutch >>>>>>>> > > > metatag.forrest-version : 0.10-dev >>>>>>>> > > > metatag.generator : Apache Forrest >>>>>>>> > > > metatag.content-type : text/html; charset=UTF-8 >>>>>>>> > > > >>>>>>>> > > > Now I am Trying to index meta data along with other content >>>>>>>> to Solr, I >>>>>>>> > > have >>>>>>>> > > > update solr schema.xml with <field name="meta_*" type="string" >>>>>>>> > > > stored="true" indexed="true"/> to accept every generated >>>>>>>> fields. >>>>>>>> > > > >>>>>>>> > > > My questing is how to >>>>>>>> > > > 1. Index meta data in Solr ? When I execute ./bin/nutch >>>>>>>> parsechecker >>>>>>>> > > > http://nutch.apache.org/ it will extract and give the meta >>>>>>>> tags on >>>>>>>> > > > standard >>>>>>>> > > > output, how to ask solr to index these metatags. >>>>>>>> > > > 2. Is it possible to integrate with bit/crawl default script >>>>>>>> with >>>>>>>> > > > modifications >>>>>>>> > > > bin/crawl urls/seed.txt TestCrawl1.3 >>>>>>>> http://localhost:8983/solr/ 1 >>>>>>>> > > > This will index sites content on solr but not the meta >>>>>>>> data >>>>>>>> > > > >>>>>>>> > > > Can any one please help me , Thanks in Advance. >>>>>>>> > > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > -- >>>>>>>> > > Talat UYARER >>>>>>>> > > Websitesi: http://talat.uyarer.com >>>>>>>> > > Twitter: http://twitter.com/talatuyarer >>>>>>>> > > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >>>>>>>> > > >>>>>>>> > >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Talat UYARER >>>>>>>> Websitesi: http://talat.uyarer.com >>>>>>>> Twitter: http://twitter.com/talatuyarer >>>>>>>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Talat UYARER >>>>> Websitesi: http://talat.uyarer.com >>>>> Twitter: http://twitter.com/talatuyarer >>>>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >>>>> >>>> >>>> >>> >> >> >> -- >> Talat UYARER >> Websitesi: http://talat.uyarer.com >> Twitter: http://twitter.com/talatuyarer >> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >> > >

