Does anyone have any insight into the following error I am seeing in the
hadoop logs? Is this something I should be concerned with, or is it expected
that this shows up in the logs from time to time? If it is not expected,
where can I look for more information on what is going on?
2009-10-16 17:02
Thanks for the quick response.
I am interested as my company is looking at Googe enterprise search/google
appliance and i was wondering whether the nutch software could be a possible
option to evaluate.
At the moment we will be using Google as a search engine for the intranet
for provision of in
Depending on what you are wanting to do Solr may be a better choice as
and Enterprise search server. If you are needing crawling you can use
Nutch or attach a different crawler to Solr. If you are wanting to do
more full web type search, then Nutch is a better option. What are your
requireme
Does anybody have any information on using Nutch as Enterprise search ?, and
what would I need ?
is it just a case of the current nutch package or do you need other addons.
And how does that compare against Google Enterprise ?
thanks
--
View this message in context:
http://www.nabble.com/Nutch
On Fri, Oct 16, 2009 at 10:19 AM, Dennis Kubes wrote:
> Because you are crawling the local files you would either need urls in the
> initial urlDir text file or those documents you are crawling would need to
> point to the other urls.
>
Another way to do this is to put the following in the docume
Whole web crawling is about indexing the entire web versus deep indexing
of a single site.
The urls parameter is the urlDir, a directory which should hold one or
more text files with listings of urls to be fetched. The dir parameter
is the output directory for the crawls.
Because you are cr