Hi Shanaka,

I uploaded new patch for NUtch-1478. If you have time. Can you review it ?

Thanks


2014-03-11 17:21 GMT+02:00 Shanaka Jayasundera <[email protected]>:

> Hi Talat,
>
> Excellent news, Will you be able to prepare the patch file compatible with
> Nutch 2.2.1 ( Latest Version) ?
> I will try your new patch.
>
> Thanks,
>
>
> On Tue, Mar 11, 2014 at 11:14 AM, Talat Uyarer <[email protected]> wrote:
>
> > Hi Shanaka,
> >
> > Yes. New patch is on the way. I hope I will send on the issue tonight. I
> > clean unnesssary code blocks, rename methods, update solr schema etc. :)
> >
> > Talat
> >
> >
> > 2014-03-11 16:47 GMT+02:00 Shanaka Jayasundera <[email protected]>:
> >
> > > Hi Talat,
> > > Thanks lot, I came this far because of your Patch and explanation.
>  I've
> > > used latest patch you have published on  28/Feb/14 09:59, You meant to
> > say
> > > new patch is on the way ?
> > >
> > > Thanks, Shanaka
> > >
> > >
> > >
> > > On Tue, Mar 11, 2014 at 10:24 AM, Talat Uyarer <[email protected]>
> wrote:
> > >
> > > > Hi Shanaka,
> > > >
> > > > I develop NUTCH-1478. It has some updates. If it will be problem, I
> > will
> > > > answer your questions after my update patch. Also you can review my
> > last
> > > > update :)
> > > >
> > > > Talat
> > > >
> > > >
> > > > 2014-03-11 14:27 GMT+02:00 Shanaka Jayasundera <[email protected]>:
> > > >
> > > > > Hello ,
> > > > >
> > > > > I have configure Nutch 2.2.1 following Nutch2Tutorial
> > > > > <https://wiki.apache.org/nutch/Nutch2Tutorial>and integrated with
> > Solr
> > > > 4.7
> > > > > and  it's working fine. Then I wanted to parse HTML and index meta
> > tags
> > > > in
> > > > > solr.
> > > > > Since Parse-metatags is not supported by default I follow
> > > "Parse-metatags
> > > > > and index-metadata plugin for Nutch 2.x
> > > > > series<https://issues.apache.org/jira/browse/NUTCH-1478>" and
> > > > > installed patchNUTCH-1478v5.patc.<
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12631702/NUTCH-1478v5.patch
> > > > > >
> > > > >
> > > > > I think I have install it correctly because i get following out put
> > > when
> > > > I
> > > > > try to parch a URL
> > > > >
> > > > > $ ./bin/nutch parsechecker http://nutch.apache.org/
> > > > > fetching: http://nutch.apache.org/
> > > > > parsing: http://nutch.apache.org/
> > > > > contentType: text/html
> > > > > signature: 030a8fe7684b5357663e041327e3d96b
> > > > > ---------
> > > > > Url
> > > > > ---------------
> > > > > http://nutch.apache.org/
> > > > > ---------
> > > > > Metadata
> > > > > ---------
> > > > > metatag.forrest-skin-name :     nutch
> > > > > metatag.forrest-version :     0.10-dev
> > > > > metatag.generator :     Apache Forrest
> > > > > metatag.content-type :     text/html; charset=UTF-8
> > > > >
> > > > > Now I am Trying to index meta data along with other content to
> Solr,
> > I
> > > > have
> > > > > update solr schema.xml with <field name="meta_*" type="string"
> > > > > stored="true" indexed="true"/> to accept every generated fields.
> > > > >
> > > > > My questing is how to
> > > > > 1. Index meta data in Solr ? When I execute ./bin/nutch
> parsechecker
> > > > > http://nutch.apache.org/ it will extract and give the meta tags on
> > > > > standard
> > > > > output, how to ask solr to index these metatags.
> > > > > 2. Is it possible to integrate with bit/crawl default script with
> > > > > modifications
> > > > >     bin/crawl urls/seed.txt TestCrawl1.3
> http://localhost:8983/solr/1
> > > > >     This will index sites content on solr but not the meta data
> > > > >
> > > > > Can any one please help me , Thanks in Advance.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Talat UYARER
> > > > Websitesi: http://talat.uyarer.com
> > > > Twitter: http://twitter.com/talatuyarer
> > > > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
> > > >
> > >
> >
> >
> >
> > --
> > Talat UYARER
> > Websitesi: http://talat.uyarer.com
> > Twitter: http://twitter.com/talatuyarer
> > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
> >
>



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Reply via email to