Re: Way to fetch only new sites

Jayadeep Reddy Thu, 01 Aug 2013 06:26:27 -0700

Laxmi Some one in the group should have a solution to skip database table
while crawling new sites. I searched online but cant find one.




On Thu, Aug 1, 2013 at 6:47 PM, A Laxmi <[email protected]> wrote:

> Jaydeep - I have the same problem as well. When I run a fresh crawl, only
> the urls in the webpage table are being crawled over and over, it was
> ignoring the new urls in seed.txt.
>
>
> On Thu, Aug 1, 2013 at 9:03 AM, Jayadeep Reddy
> <[email protected]>wrote:
>
> > I am using Nutch 2.1 every time I run crawl from dmoz directory my
> existing
> > crawled pages in the database are fetched again(Taking long time/). Is
> > there a way to crawl only new sites.
> >
> > Thank you
> >
> > --
> > Jayadeep Reddy.S,
> > M.D & C.E.O
> > e Health Access Pvt.Ltd
> > www.ehealthaccess.com
> > Hyderabad-Chennai-Banglore
> > http://www.youtube.com/watch?v=0k5LX8mw6Sk
> >
>



-- 
Jayadeep Reddy.S,
M.D & C.E.O
e Health Access Pvt.Ltd
www.ehealthaccess.com
Hyderabad-Chennai-Banglore
http://www.youtube.com/watch?v=0k5LX8mw6Sk

Re: Way to fetch only new sites

Reply via email to