Hi, In my urls file I have mysite.com and this site has links to all files like cv.htm mypaper.pdf and etc.
Thanks. Alex. -----Original Message----- From: Susam Pal <[EMAIL PROTECTED]> To: [email protected] Sent: Wed, 9 Jan 2008 8:34 pm Subject: Re: some crawl problems What is present in your seed url list? Nutch fetches new URLs during a fetch in the next level of depth by discovering new URLs from the current fetch. So, if you have http://mysite.com/ in your seed URL list and the home page does not have a link to http://mysite.com/cv.htm, the crawler wouldn't be able to reach that page. Regards, Susam Pal On Jan 10, 2008 3:56 AM, <[EMAIL PROTECTED]> wrote: > > Hello all, > > I am using nutch 9 and when I fetch a couple of sites nutch does not include pages other that the main one. > For example, if I have mysite.com/cv.htm, nutch fetches only mysite.com. It does not fetch cv.htm and other files in the site. > I noticed that if I do? bin/nutch generate crawl/crawldb crawl/segments -topN 1000? > after? > ?bin/nutch generate crawl/crawldb crawl/segments > > it includes some of those pages but not all of them. > > Is there any way to tell nutch to crawl all the objects in mysite.com > > Also, I wondered how to put nutch in a website, let say mysite.com/search? > > Thanks in advance. > Alex. > > > > -----Original Message----- > From: payo <[EMAIL PROTECTED]> > To: [email protected] > Sent: Wed, 9 Jan 2008 10:18 am > Subject: Re: subcollections > > > > > hi to all > > i can configure this part. > > 1.- agree subcollection plucgin in nutch-site.xml in the tomcat > > Tomcat\webapps\ROOT\WEB-INF\classes\nutch-site.xml > > 2.- agree label select in te serach.jsp indicating the subcollections > > line 147 <form name="search" action="../search.jsp" method="get"> > <SELECT NAME="subcollection"> > <option selected value=<%=subcoleccion%>><%=subcoleccion%></option> > <OPTION VALUE="apache">Apache</OPTION> > <OPTION VALUE="nutch">Nutch</OPTION> > <OPTION VALUE="xml">XML</OPTION> > </SELECT> > > > thanks > > -- > View this message in context: > http://www.nabble.com/subcollections-tp14373976p14716644.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > > > ________________________________________________________________________ > More new features than ever. Check out the new AIM(R) Mail ! - http://webmail.aim.com > ________________________________________________________________________ More new features than ever. Check out the new AIM(R) Mail ! - http://webmail.aim.com
