Hello ,

I have tested the latest patch, Since I'm  using  Nutch  2.2.1,  patch
installation was not straight forward,
I mean using,
 $patch < NUTCH-1478v6.patch

Probably it's straight forward with latest dev version on nutch so not to
worry to much and I manage to installed the patch with few manual work and
everything looks ok and parcechecker is also giving expected output.

Anyway I came to the same possession where I got issues with Solr search.
Probably meta tags are indexed on Solr  but not searchable.
I'm wondering do i need to use copyfield to copy metatags to text field in
solr. What you think ?

My other question is, on solr, schema.xml you specify dynamic name as
meta_*, is that  needs to be metatag_* ?

Appreciate community support on this.

Thanks,
Shanaka

On Wed, Mar 12, 2014 at 2:43 PM, Talat Uyarer <[email protected]> wrote:

> Hey Shanaka,
>
> This patch based on lastest 2.x branch. You can download code of lastest
> 2.x from github[1] Then you apply the patch.
>
> [1] https://github.com/apache/nutch/archive/2.x.zip
>
>
> 2014-03-12 16:08 GMT+02:00 Shanaka Jayasundera <[email protected]>:
>
> Hi Talat,
>>
>> I am trying your new patch, do i need to still need to start with zip
>> file or its sufficient to take latest patch ?
>>
>> Thanks,
>> Shanaka
>>
>>
>> On Wed, Mar 12, 2014 at 7:57 AM, Shanaka Jayasundera 
>> <[email protected]>wrote:
>>
>>> Hi Talat,
>>>
>>> Yes I add like following,
>>>
>>> <field name="meta_*" type="string" stored="true" indexed="true"/>
>>> Also I tried giving the full name as field name.
>>>
>>> Thanks,
>>> Shanaka
>>>
>>>
>>> On Wed, Mar 12, 2014 at 7:52 AM, Talat Uyarer <[email protected]> wrote:
>>>
>>>> Hi Shanaka,
>>>>
>>>> Did you add meta field your schema of solr ?
>>>>
>>>> Talat
>>>>
>>>>
>>>> 2014-03-12 13:25 GMT+02:00 Shanaka Jayasundera <[email protected]>:
>>>>
>>>> Hi Talat,
>>>>> How patch work going on ?
>>>>> Appreciate if you can help me, I am unable to proceed because meta
>>>>> data is not getting indexed on solr.
>>>>>
>>>>> Thanks,
>>>>> Shanaka
>>>>>
>>>>>
>>>>> On Tue, Mar 11, 2014 at 11:21 AM, Shanaka Jayasundera <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Talat,
>>>>>>
>>>>>> Excellent news, Will you be able to prepare the patch file compatible
>>>>>> with Nutch 2.2.1 ( Latest Version) ?
>>>>>> I will try your new patch.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 11, 2014 at 11:14 AM, Talat Uyarer <[email protected]>wrote:
>>>>>>
>>>>>>> Hi Shanaka,
>>>>>>>
>>>>>>> Yes. New patch is on the way. I hope I will send on the issue
>>>>>>> tonight. I
>>>>>>> clean unnesssary code blocks, rename methods, update solr schema
>>>>>>> etc. :)
>>>>>>>
>>>>>>> Talat
>>>>>>>
>>>>>>>
>>>>>>> 2014-03-11 16:47 GMT+02:00 Shanaka Jayasundera <[email protected]>:
>>>>>>>
>>>>>>> > Hi Talat,
>>>>>>> > Thanks lot, I came this far because of your Patch and explanation.
>>>>>>>  I've
>>>>>>> > used latest patch you have published on  28/Feb/14 09:59, You
>>>>>>> meant to say
>>>>>>> > new patch is on the way ?
>>>>>>> >
>>>>>>> > Thanks, Shanaka
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > On Tue, Mar 11, 2014 at 10:24 AM, Talat Uyarer <[email protected]>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > > Hi Shanaka,
>>>>>>> > >
>>>>>>> > > I develop NUTCH-1478. It has some updates. If it will be
>>>>>>> problem, I will
>>>>>>> > > answer your questions after my update patch. Also you can review
>>>>>>> my last
>>>>>>> > > update :)
>>>>>>> > >
>>>>>>> > > Talat
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > 2014-03-11 14:27 GMT+02:00 Shanaka Jayasundera <
>>>>>>> [email protected]>:
>>>>>>> > >
>>>>>>> > > > Hello ,
>>>>>>> > > >
>>>>>>> > > > I have configure Nutch 2.2.1 following Nutch2Tutorial
>>>>>>> > > > <https://wiki.apache.org/nutch/Nutch2Tutorial>and integrated
>>>>>>> with Solr
>>>>>>> > > 4.7
>>>>>>> > > > and  it's working fine. Then I wanted to parse HTML and index
>>>>>>> meta tags
>>>>>>> > > in
>>>>>>> > > > solr.
>>>>>>> > > > Since Parse-metatags is not supported by default I follow
>>>>>>> > "Parse-metatags
>>>>>>> > > > and index-metadata plugin for Nutch 2.x
>>>>>>> > > > series<https://issues.apache.org/jira/browse/NUTCH-1478>" and
>>>>>>> > > > installed patchNUTCH-1478v5.patc.<
>>>>>>> > > >
>>>>>>> > >
>>>>>>> >
>>>>>>> https://issues.apache.org/jira/secure/attachment/12631702/NUTCH-1478v5.patch
>>>>>>> > > > >
>>>>>>> > > >
>>>>>>> > > > I think I have install it correctly because i get following
>>>>>>> out put
>>>>>>> > when
>>>>>>> > > I
>>>>>>> > > > try to parch a URL
>>>>>>> > > >
>>>>>>> > > > $ ./bin/nutch parsechecker http://nutch.apache.org/
>>>>>>> > > > fetching: http://nutch.apache.org/
>>>>>>> > > > parsing: http://nutch.apache.org/
>>>>>>> > > > contentType: text/html
>>>>>>> > > > signature: 030a8fe7684b5357663e041327e3d96b
>>>>>>> > > > ---------
>>>>>>> > > > Url
>>>>>>> > > > ---------------
>>>>>>> > > > http://nutch.apache.org/
>>>>>>> > > > ---------
>>>>>>> > > > Metadata
>>>>>>> > > > ---------
>>>>>>> > > > metatag.forrest-skin-name :     nutch
>>>>>>> > > > metatag.forrest-version :     0.10-dev
>>>>>>> > > > metatag.generator :     Apache Forrest
>>>>>>> > > > metatag.content-type :     text/html; charset=UTF-8
>>>>>>> > > >
>>>>>>> > > > Now I am Trying to index meta data along with other content to
>>>>>>> Solr, I
>>>>>>> > > have
>>>>>>> > > > update solr schema.xml with <field name="meta_*" type="string"
>>>>>>> > > > stored="true" indexed="true"/> to accept every generated
>>>>>>> fields.
>>>>>>> > > >
>>>>>>> > > > My questing is how to
>>>>>>> > > > 1. Index meta data in Solr ? When I execute ./bin/nutch
>>>>>>> parsechecker
>>>>>>> > > > http://nutch.apache.org/ it will extract and give the meta
>>>>>>> tags on
>>>>>>> > > > standard
>>>>>>> > > > output, how to ask solr to index these metatags.
>>>>>>> > > > 2. Is it possible to integrate with bit/crawl default script
>>>>>>> with
>>>>>>> > > > modifications
>>>>>>> > > >     bin/crawl urls/seed.txt TestCrawl1.3
>>>>>>> http://localhost:8983/solr/ 1
>>>>>>> > > >     This will index sites content on solr but not the meta data
>>>>>>> > > >
>>>>>>> > > > Can any one please help me , Thanks in Advance.
>>>>>>> > > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > --
>>>>>>> > > Talat UYARER
>>>>>>> > > Websitesi: http://talat.uyarer.com
>>>>>>> > > Twitter: http://twitter.com/talatuyarer
>>>>>>> > > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Talat UYARER
>>>>>>> Websitesi: http://talat.uyarer.com
>>>>>>> Twitter: http://twitter.com/talatuyarer
>>>>>>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Talat UYARER
>>>> Websitesi: http://talat.uyarer.com
>>>> Twitter: http://twitter.com/talatuyarer
>>>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>>>
>>>
>>>
>>
>
>
> --
> Talat UYARER
> Websitesi: http://talat.uyarer.com
> Twitter: http://twitter.com/talatuyarer
> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>

Reply via email to