Hey Shanaka,

This patch based on lastest 2.x branch. You can download code of lastest
2.x from github[1] Then you apply the patch.

[1] https://github.com/apache/nutch/archive/2.x.zip


2014-03-12 16:08 GMT+02:00 Shanaka Jayasundera <[email protected]>:

> Hi Talat,
>
> I am trying your new patch, do i need to still need to start with zip file
> or its sufficient to take latest patch ?
>
> Thanks,
> Shanaka
>
>
> On Wed, Mar 12, 2014 at 7:57 AM, Shanaka Jayasundera 
> <[email protected]>wrote:
>
>> Hi Talat,
>>
>> Yes I add like following,
>>
>> <field name="meta_*" type="string" stored="true" indexed="true"/>
>> Also I tried giving the full name as field name.
>>
>> Thanks,
>> Shanaka
>>
>>
>> On Wed, Mar 12, 2014 at 7:52 AM, Talat Uyarer <[email protected]> wrote:
>>
>>> Hi Shanaka,
>>>
>>> Did you add meta field your schema of solr ?
>>>
>>> Talat
>>>
>>>
>>> 2014-03-12 13:25 GMT+02:00 Shanaka Jayasundera <[email protected]>:
>>>
>>> Hi Talat,
>>>> How patch work going on ?
>>>> Appreciate if you can help me, I am unable to proceed because meta data
>>>> is not getting indexed on solr.
>>>>
>>>> Thanks,
>>>> Shanaka
>>>>
>>>>
>>>> On Tue, Mar 11, 2014 at 11:21 AM, Shanaka Jayasundera <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Talat,
>>>>>
>>>>> Excellent news, Will you be able to prepare the patch file compatible
>>>>> with Nutch 2.2.1 ( Latest Version) ?
>>>>> I will try your new patch.
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> On Tue, Mar 11, 2014 at 11:14 AM, Talat Uyarer <[email protected]>wrote:
>>>>>
>>>>>> Hi Shanaka,
>>>>>>
>>>>>> Yes. New patch is on the way. I hope I will send on the issue
>>>>>> tonight. I
>>>>>> clean unnesssary code blocks, rename methods, update solr schema etc.
>>>>>> :)
>>>>>>
>>>>>> Talat
>>>>>>
>>>>>>
>>>>>> 2014-03-11 16:47 GMT+02:00 Shanaka Jayasundera <[email protected]>:
>>>>>>
>>>>>> > Hi Talat,
>>>>>> > Thanks lot, I came this far because of your Patch and explanation.
>>>>>>  I've
>>>>>> > used latest patch you have published on  28/Feb/14 09:59, You meant
>>>>>> to say
>>>>>> > new patch is on the way ?
>>>>>> >
>>>>>> > Thanks, Shanaka
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 11, 2014 at 10:24 AM, Talat Uyarer <[email protected]>
>>>>>> wrote:
>>>>>> >
>>>>>> > > Hi Shanaka,
>>>>>> > >
>>>>>> > > I develop NUTCH-1478. It has some updates. If it will be problem,
>>>>>> I will
>>>>>> > > answer your questions after my update patch. Also you can review
>>>>>> my last
>>>>>> > > update :)
>>>>>> > >
>>>>>> > > Talat
>>>>>> > >
>>>>>> > >
>>>>>> > > 2014-03-11 14:27 GMT+02:00 Shanaka Jayasundera <
>>>>>> [email protected]>:
>>>>>> > >
>>>>>> > > > Hello ,
>>>>>> > > >
>>>>>> > > > I have configure Nutch 2.2.1 following Nutch2Tutorial
>>>>>> > > > <https://wiki.apache.org/nutch/Nutch2Tutorial>and integrated
>>>>>> with Solr
>>>>>> > > 4.7
>>>>>> > > > and  it's working fine. Then I wanted to parse HTML and index
>>>>>> meta tags
>>>>>> > > in
>>>>>> > > > solr.
>>>>>> > > > Since Parse-metatags is not supported by default I follow
>>>>>> > "Parse-metatags
>>>>>> > > > and index-metadata plugin for Nutch 2.x
>>>>>> > > > series<https://issues.apache.org/jira/browse/NUTCH-1478>" and
>>>>>> > > > installed patchNUTCH-1478v5.patc.<
>>>>>> > > >
>>>>>> > >
>>>>>> >
>>>>>> https://issues.apache.org/jira/secure/attachment/12631702/NUTCH-1478v5.patch
>>>>>> > > > >
>>>>>> > > >
>>>>>> > > > I think I have install it correctly because i get following out
>>>>>> put
>>>>>> > when
>>>>>> > > I
>>>>>> > > > try to parch a URL
>>>>>> > > >
>>>>>> > > > $ ./bin/nutch parsechecker http://nutch.apache.org/
>>>>>> > > > fetching: http://nutch.apache.org/
>>>>>> > > > parsing: http://nutch.apache.org/
>>>>>> > > > contentType: text/html
>>>>>> > > > signature: 030a8fe7684b5357663e041327e3d96b
>>>>>> > > > ---------
>>>>>> > > > Url
>>>>>> > > > ---------------
>>>>>> > > > http://nutch.apache.org/
>>>>>> > > > ---------
>>>>>> > > > Metadata
>>>>>> > > > ---------
>>>>>> > > > metatag.forrest-skin-name :     nutch
>>>>>> > > > metatag.forrest-version :     0.10-dev
>>>>>> > > > metatag.generator :     Apache Forrest
>>>>>> > > > metatag.content-type :     text/html; charset=UTF-8
>>>>>> > > >
>>>>>> > > > Now I am Trying to index meta data along with other content to
>>>>>> Solr, I
>>>>>> > > have
>>>>>> > > > update solr schema.xml with <field name="meta_*" type="string"
>>>>>> > > > stored="true" indexed="true"/> to accept every generated fields.
>>>>>> > > >
>>>>>> > > > My questing is how to
>>>>>> > > > 1. Index meta data in Solr ? When I execute ./bin/nutch
>>>>>> parsechecker
>>>>>> > > > http://nutch.apache.org/ it will extract and give the meta
>>>>>> tags on
>>>>>> > > > standard
>>>>>> > > > output, how to ask solr to index these metatags.
>>>>>> > > > 2. Is it possible to integrate with bit/crawl default script
>>>>>> with
>>>>>> > > > modifications
>>>>>> > > >     bin/crawl urls/seed.txt TestCrawl1.3
>>>>>> http://localhost:8983/solr/ 1
>>>>>> > > >     This will index sites content on solr but not the meta data
>>>>>> > > >
>>>>>> > > > Can any one please help me , Thanks in Advance.
>>>>>> > > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > --
>>>>>> > > Talat UYARER
>>>>>> > > Websitesi: http://talat.uyarer.com
>>>>>> > > Twitter: http://twitter.com/talatuyarer
>>>>>> > > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>>>>> > >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Talat UYARER
>>>>>> Websitesi: http://talat.uyarer.com
>>>>>> Twitter: http://twitter.com/talatuyarer
>>>>>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Talat UYARER
>>> Websitesi: http://talat.uyarer.com
>>> Twitter: http://twitter.com/talatuyarer
>>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>>
>>
>>
>


-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Reply via email to