Re: Hosts File & Nutch 1.0+

Alex Tue, 19 Apr 2011 16:45:52 -0700

I edited that so that it does not disclose the location of myrootUrLDir. The path is accurate.

I am going to find out what command is given to nutch but basicallythe application developer has confirmed that the issue is the hostsfile or something on the server that can not search itself.


Alex
On Apr 19, 2011, at 5:22 PM, Mark Achee wrote:

From your logs:


INFO sitesearch.CrawlerUtil: rootUrlDir = /path/to/directory/

Looks like you didn't set the seed urls directory. If that's notenough

info for you to fix it, send the full command you're running.

-Mark

On Thu, Apr 14, 2011 at 10:57 PM, Alex <[email protected]>wrote:

Hi,

I am new to Nutch.  I have an application that uses Nutch to search.
I have configured the application so that Nutch can run.  However,
after a lot of troubleshooting I have been pointed to the fact that

there is something wrong with my hosts file. My hostname isdifferent

than my domain name and that "seems" to make Nutch stop in depth 1.
Does anyone have any idea of what is the correct configuration of the
hosts file so that nutch runs properly?

My domain name resolves fine.  Please help me!

Here are the logs of the indexing:

Stopping at depth=1 - no more URLs to fetch.

INFO sitesearch.CrawlerUtil: indexHost : Starting an Site Search
index on host www.mydomain.com

INFO sitesearch.CrawlerUtil: site search crawl started in: /opt/dotcms/

dotCMS/assets/search_index/www.mydomain.com/1-XXX_temp/crawl-index
] INFO sitesearch.CrawlerUtil: rootUrlDir = /path/to/directory/
search_index/www.mydomain.com/url_folder
INFO sitesearch.CrawlerUtil: threads = 10
INFO sitesearch.CrawlerUtil: depth = 20
INFO sitesearch.CrawlerUtil: indexer=lucene

INFO sitesearch.CrawlerUtil: Stopping at depth=1 - no more URLs to
fetch.

NFO sitesearch.CrawlerUtil: site search crawl finished: /directorypath/

search_index/www.mydomain.com/1xxx/crawl-index

INFO sitesearch.CrawlerUtil: indexHost : Finished Site Search indexon

host www.mydomain.com

Re: Hosts File & Nutch 1.0+

Reply via email to