Hi...the files are available on the server but are not necessarily hyperlinked in the html from the main page. In fact, I was just using that directory for storage. Now, I want to be able to discover files like them on other servers. This is what I am wondering is possible or not.
Thanks, Adam On Wed, Aug 24, 2011 at 4:58 PM, lewis john mcgibbney < [email protected]> wrote: > Hi Adam, > > My initial thoughts are that you are correct. It is very unusual for your > files to be located on an URL in the same domain which is not referenced by > the top level or a subsequent level URL within the domain. > > What I would suggest is that you have a look through your hadoop.log as > well > as use some of the commans which enable you to investigate your crawldb, > segment(s) and linkdb if you've created one. > > have a look at the wiki under command line options > > On Wed, Aug 24, 2011 at 9:03 PM, Adam Estrada < > [email protected] > > wrote: > > > All, > > > > I have a root domain and a couple directories deep I have some files that > I > > want to index. The problem is that they are not referenced on the main > page > > using a hyperlink or anything like that. > > > > http://www.geoglobaldomination.org/kml/temp/ > > > > I want to be able to crawl down in to /kml/temp/ without knowing that > it's > > even there. Is there a way to do this in Nutch? > > > > echo http://www.geoglobaldomination.org > urls > > > > ./nutch crawl urls -threads 10 -depth 10 -topN 20 -solr > > http://172.16.2.107:8983/solr > > > > Nothing and I suspect that it's because there is not a hyperlink on the > > main > > page. > > > > Thoughts? > > Adam > > > > > > -- > *Lewis* >

