Hi, Is there a  way to identify what Nutch is sending Solr to index ?
Trying to debug and see issue is on Nutch side or Solr side.

Thanks,
Shanaka


On Thu, Mar 13, 2014 at 9:15 AM, Shanaka Jayasundera <[email protected]>wrote:

> Hello ,
>
> I have tested the latest patch, Since I'm  using  Nutch  2.2.1,  patch
> installation was not straight forward,
>
> I mean using,
>  $patch < NUTCH-1478v6.patch
>
> Probably it's straight forward with latest dev version on nutch so not to
> worry to much and I manage to installed the patch with few manual work and
> everything looks ok and parcechecker is also giving expected output.
>
> Anyway I came to the same possession where I got issues with Solr search.
> Probably meta tags are indexed on Solr  but not searchable.
> I'm wondering do i need to use copyfield to copy metatags to text field in
> solr. What you think ?
>
> My other question is, on solr, schema.xml you specify dynamic name as
> meta_*, is that  needs to be metatag_* ?
>
> Appreciate community support on this.
>
> Thanks,
> Shanaka
>
>
> On Wed, Mar 12, 2014 at 2:43 PM, Talat Uyarer <[email protected]> wrote:
>
>> Hey Shanaka,
>>
>> This patch based on lastest 2.x branch. You can download code of lastest
>> 2.x from github[1] Then you apply the patch.
>>
>> [1] https://github.com/apache/nutch/archive/2.x.zip
>>
>>
>> 2014-03-12 16:08 GMT+02:00 Shanaka Jayasundera <[email protected]>:
>>
>> Hi Talat,
>>>
>>> I am trying your new patch, do i need to still need to start with zip
>>> file or its sufficient to take latest patch ?
>>>
>>> Thanks,
>>> Shanaka
>>>
>>>
>>> On Wed, Mar 12, 2014 at 7:57 AM, Shanaka Jayasundera <[email protected]
>>> > wrote:
>>>
>>>> Hi Talat,
>>>>
>>>> Yes I add like following,
>>>>
>>>> <field name="meta_*" type="string" stored="true" indexed="true"/>
>>>> Also I tried giving the full name as field name.
>>>>
>>>> Thanks,
>>>> Shanaka
>>>>
>>>>
>>>> On Wed, Mar 12, 2014 at 7:52 AM, Talat Uyarer <[email protected]> wrote:
>>>>
>>>>> Hi Shanaka,
>>>>>
>>>>> Did you add meta field your schema of solr ?
>>>>>
>>>>> Talat
>>>>>
>>>>>
>>>>> 2014-03-12 13:25 GMT+02:00 Shanaka Jayasundera <[email protected]>:
>>>>>
>>>>> Hi Talat,
>>>>>> How patch work going on ?
>>>>>> Appreciate if you can help me, I am unable to proceed because meta
>>>>>> data is not getting indexed on solr.
>>>>>>
>>>>>> Thanks,
>>>>>> Shanaka
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 11, 2014 at 11:21 AM, Shanaka Jayasundera <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Talat,
>>>>>>>
>>>>>>> Excellent news, Will you be able to prepare the patch file
>>>>>>> compatible with Nutch 2.2.1 ( Latest Version) ?
>>>>>>> I will try your new patch.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 11, 2014 at 11:14 AM, Talat Uyarer <[email protected]>wrote:
>>>>>>>
>>>>>>>> Hi Shanaka,
>>>>>>>>
>>>>>>>> Yes. New patch is on the way. I hope I will send on the issue
>>>>>>>> tonight. I
>>>>>>>> clean unnesssary code blocks, rename methods, update solr schema
>>>>>>>> etc. :)
>>>>>>>>
>>>>>>>> Talat
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-03-11 16:47 GMT+02:00 Shanaka Jayasundera <[email protected]
>>>>>>>> >:
>>>>>>>>
>>>>>>>> > Hi Talat,
>>>>>>>> > Thanks lot, I came this far because of your Patch and
>>>>>>>> explanation.  I've
>>>>>>>> > used latest patch you have published on  28/Feb/14 09:59, You
>>>>>>>> meant to say
>>>>>>>> > new patch is on the way ?
>>>>>>>> >
>>>>>>>> > Thanks, Shanaka
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Tue, Mar 11, 2014 at 10:24 AM, Talat Uyarer <[email protected]>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > > Hi Shanaka,
>>>>>>>> > >
>>>>>>>> > > I develop NUTCH-1478. It has some updates. If it will be
>>>>>>>> problem, I will
>>>>>>>> > > answer your questions after my update patch. Also you can
>>>>>>>> review my last
>>>>>>>> > > update :)
>>>>>>>> > >
>>>>>>>> > > Talat
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > > 2014-03-11 14:27 GMT+02:00 Shanaka Jayasundera <
>>>>>>>> [email protected]>:
>>>>>>>> > >
>>>>>>>> > > > Hello ,
>>>>>>>> > > >
>>>>>>>> > > > I have configure Nutch 2.2.1 following Nutch2Tutorial
>>>>>>>> > > > <https://wiki.apache.org/nutch/Nutch2Tutorial>and integrated
>>>>>>>> with Solr
>>>>>>>> > > 4.7
>>>>>>>> > > > and  it's working fine. Then I wanted to parse HTML and index
>>>>>>>> meta tags
>>>>>>>> > > in
>>>>>>>> > > > solr.
>>>>>>>> > > > Since Parse-metatags is not supported by default I follow
>>>>>>>> > "Parse-metatags
>>>>>>>> > > > and index-metadata plugin for Nutch 2.x
>>>>>>>> > > > series<https://issues.apache.org/jira/browse/NUTCH-1478>" and
>>>>>>>> > > > installed patchNUTCH-1478v5.patc.<
>>>>>>>> > > >
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> https://issues.apache.org/jira/secure/attachment/12631702/NUTCH-1478v5.patch
>>>>>>>> > > > >
>>>>>>>> > > >
>>>>>>>> > > > I think I have install it correctly because i get following
>>>>>>>> out put
>>>>>>>> > when
>>>>>>>> > > I
>>>>>>>> > > > try to parch a URL
>>>>>>>> > > >
>>>>>>>> > > > $ ./bin/nutch parsechecker http://nutch.apache.org/
>>>>>>>> > > > fetching: http://nutch.apache.org/
>>>>>>>> > > > parsing: http://nutch.apache.org/
>>>>>>>> > > > contentType: text/html
>>>>>>>> > > > signature: 030a8fe7684b5357663e041327e3d96b
>>>>>>>> > > > ---------
>>>>>>>> > > > Url
>>>>>>>> > > > ---------------
>>>>>>>> > > > http://nutch.apache.org/
>>>>>>>> > > > ---------
>>>>>>>> > > > Metadata
>>>>>>>> > > > ---------
>>>>>>>> > > > metatag.forrest-skin-name :     nutch
>>>>>>>> > > > metatag.forrest-version :     0.10-dev
>>>>>>>> > > > metatag.generator :     Apache Forrest
>>>>>>>> > > > metatag.content-type :     text/html; charset=UTF-8
>>>>>>>> > > >
>>>>>>>> > > > Now I am Trying to index meta data along with other content
>>>>>>>> to Solr, I
>>>>>>>> > > have
>>>>>>>> > > > update solr schema.xml with <field name="meta_*" type="string"
>>>>>>>> > > > stored="true" indexed="true"/> to accept every generated
>>>>>>>> fields.
>>>>>>>> > > >
>>>>>>>> > > > My questing is how to
>>>>>>>> > > > 1. Index meta data in Solr ? When I execute ./bin/nutch
>>>>>>>> parsechecker
>>>>>>>> > > > http://nutch.apache.org/ it will extract and give the meta
>>>>>>>> tags on
>>>>>>>> > > > standard
>>>>>>>> > > > output, how to ask solr to index these metatags.
>>>>>>>> > > > 2. Is it possible to integrate with bit/crawl default script
>>>>>>>> with
>>>>>>>> > > > modifications
>>>>>>>> > > >     bin/crawl urls/seed.txt TestCrawl1.3
>>>>>>>> http://localhost:8983/solr/ 1
>>>>>>>> > > >     This will index sites content on solr but not the meta
>>>>>>>> data
>>>>>>>> > > >
>>>>>>>> > > > Can any one please help me , Thanks in Advance.
>>>>>>>> > > >
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > > --
>>>>>>>> > > Talat UYARER
>>>>>>>> > > Websitesi: http://talat.uyarer.com
>>>>>>>> > > Twitter: http://twitter.com/talatuyarer
>>>>>>>> > > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>>>>>>> > >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Talat UYARER
>>>>>>>> Websitesi: http://talat.uyarer.com
>>>>>>>> Twitter: http://twitter.com/talatuyarer
>>>>>>>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Talat UYARER
>>>>> Websitesi: http://talat.uyarer.com
>>>>> Twitter: http://twitter.com/talatuyarer
>>>>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Talat UYARER
>> Websitesi: http://talat.uyarer.com
>> Twitter: http://twitter.com/talatuyarer
>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>
>
>

Reply via email to