I have followed the tutorial at media-style.com and actually have a
mapred installation of nutch working. Thanks Stefan :)
My question now is the correct steps to continuously fetch and index. I
have read some people talking about mergesegs and updatedb however
Stefan's tutorial doesn't list these
/classes/nutch-default.xml the parameter
searcher.dir just /user/nutchuser .
That's it.
HTH
Stefan
Am 16.12.2005 um 02:55 schrieb Michael Taggart:
I got mapred to complete a full index cycle. I now would like to
search
the index I created except I can't find out how to do
Sorry Stefan, I am so used to typing usr that I wrote my email
incorrectly. Here is exactly what is in my nutch-site.xml:
property
namesearcher.dir/name
value/user/root/value
description
Path to root of index directories. This directory is searched (in
order) for either the file
.
uncompress your nutch-XXX.war file a folder called ROOT.war with
unzip and change this in ROOT.war/WEB-INF/classes also.
Than you can simply copy this folder into TOMCAT/webapps, that's it.
Am 16.12.2005 um 20:09 schrieb Michael Taggart:
Sorry Stefan, I am so used to typing usr that I
Should I specify that urls.txt file as /user/root/urls/urls.txt so it
pulls it off the ndfs?
On Fri, 2005-12-16 at 21:39 +0100, Stefan Groschupf wrote:
I would like to crawl a list of domains,
but I would like crawling limited to just those domains. When I first
played around with nutch in
I'm also guessing that it's important for all tasktrackers to have the
appropriate configuration set in their conf/nutch-site.xml or can I just
do it on the namenode?
On Fri, 2005-12-16 at 12:57 -0800, Michael Taggart wrote:
Should I specify that urls.txt file as /user/root/urls/urls.txt so
Marko,
Thanks for the reply. Copying that folder to my nutch installation
worked! No errors here. Can't wait to unleash the power of this program.
Thanks Again,
Mike
On Thu, 2005-12-15 at 09:42 +0100, Marko Bauhardt wrote:
Hi Mike,
Exception in thread main java.lang.NullPointerException
I've followed the steps in the media-style wiki for setting up a map
reduce system. I am only having one strange error when I attempt to
start the tasktrackers. Here is my output:
[EMAIL PROTECTED] nutch]# bin/nutch-daemon.sh start tasktracker
starting tasktracker, logging
to
, although everything seems to work fine
after that, even though it gives that error, so it's probably not a problem.
- Matt Zytaruk
Michael Taggart wrote:
I've followed the steps in the media-style wiki for setting up a map
reduce system. I am only having one strange error when I
boxA.localnetwork since the name from the outside would be somthing
like:
boxA.companyDomain.com So double check that the name the boxA use to
identify itself against other boxes (host.conf) is also setuped in
the dns the other boxes use.
HTH
Stefan
Am 14.12.2005 um 22:49 schrieb Michael
check that the name the boxA use to
identify itself against other boxes (host.conf) is also setuped in
the dns the other boxes use.
HTH
Stefan
Am 14.12.2005 um 22:49 schrieb Michael Taggart:
I've followed the steps in the media-style wiki for setting up a map
reduce system. I am
is wrong with the jobtracker. Can I have the namenode also be
the jobtracker or would this cause a conflict?
Mike
On Wed, 2005-12-14 at 16:36 -0800, Michael Taggart wrote:
Ok, I think I have boiled the problem down. Turns out the jobtracker was
actually never running on my BoxA When I start
I have downloaded, installed, and successfully played around with nutch
and have to say I am quite impressed with the power of this program.
Basically, I would like to hire a nutch expert to help me layout a plan
on how to use nutch for the following scenario. We have about 1000
domains that we
and legwork myself. Just need a good guru to mentor and point me
in the right directions.
On Mon, 2005-12-12 at 15:00 -0800, sub paul wrote:
I would jump on $500 offer but I have someone paying me $250 already, for
doing twice the work.
On 12/12/05, Michael Taggart [EMAIL PROTECTED] wrote:
I have
Thanks Stefan,
I see you actually wrote that wiki article. I am going to do my best to
figure it out. I'll let the group know if I have any problems.
Mike
On Tue, 2005-12-13 at 02:53 +0100, Stefan Groschupf wrote:
Nevertheless, the mailing lists are amongst the best I've ever seen -
Hi all,
I am a total newb with nutch and tomcat, but I have followed the steps
outlined in http://lucene.apache.org/nutch/tutorial.html#Getting+Started
and was able to get the nutch page to show up when I go to
mydomain:8080. However, my problem is when I run a search. Here is the
following output
Hi all,
I am a total newb with nutch and tomcat, but I have followed the steps
outlined in http://lucene.apache.org/nutch/tutorial.html#Getting+Started
and was able to get the nutch page to show up when I go to
mydomain:8080. However, my problem is when I run a search. Here is the
following output
17 matches
Mail list logo