Hi,

In my urls file I have mysite.com and this site has links to all files like 
cv.htm mypaper.pdf and etc. 

Thanks.
Alex.


 


 

-----Original Message-----
From: Susam Pal <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wed, 9 Jan 2008 8:34 pm
Subject: Re: some crawl problems










What is present in your seed url list? Nutch fetches new URLs during a
fetch in the next level of depth by discovering new URLs from the
current fetch.

So, if you have http://mysite.com/ in your seed URL list and the home
page does not have a link to http://mysite.com/cv.htm, the crawler
wouldn't be able to reach that page.

Regards,
Susam Pal

On Jan 10, 2008 3:56 AM,  <[EMAIL PROTECTED]> wrote:
>
> Hello all,
>
> I am using nutch 9 and when I fetch a couple of sites nutch does not include 
pages other that the main one.
> For example, if I have mysite.com/cv.htm, nutch fetches only mysite.com. It 
does not fetch cv.htm and other files in the site.
> I noticed that if I do? bin/nutch generate crawl/crawldb crawl/segments -topN 
1000?
> after?
> ?bin/nutch generate crawl/crawldb crawl/segments
>
> it includes some of those pages but not all of them.
>
> Is there any way to tell nutch to crawl all the objects in mysite.com
>
> Also, I wondered how to put nutch in a website, let say mysite.com/search?
>
> Thanks in advance.
> Alex.
>
>
>
> -----Original Message-----
> From: payo <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wed, 9 Jan 2008 10:18 am
> Subject: Re: subcollections
>
>
>
>
> hi to all
>
> i can configure this part.
>
> 1.- agree subcollection plucgin in nutch-site.xml in the tomcat
>
> Tomcat\webapps\ROOT\WEB-INF\classes\nutch-site.xml
>
> 2.- agree label select in te serach.jsp indicating the subcollections
>
> line 147 <form name="search" action="../search.jsp" method="get">
>  <SELECT NAME="subcollection">
>    <option selected value=<%=subcoleccion%>><%=subcoleccion%></option>
>    <OPTION VALUE="apache">Apache</OPTION>
>    <OPTION VALUE="nutch">Nutch</OPTION>
>    <OPTION VALUE="xml">XML</OPTION>
> </SELECT>
>
>
> thanks
>
> --
> View this message in context: 
> http://www.nabble.com/subcollections-tp14373976p14716644.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
>
> ________________________________________________________________________
> More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com
>



 


________________________________________________________________________
More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com

Reply via email to