I'm able to easily saturate my 10mbit connx, but it takes a powerful
computer, if your computer is not so powerful try to fetch with
the -noParsing flag, it will offload the parsing processing untill later,
even a quad pentium 3 xeon 700mhz with 4gb of ram can only saturate about
5mbit, I've used
requirements across multiple machines, or is there another servlet program
(like resin) that will require less memory to operate, has anyone else run
into this?
Thanks,
-Jay Pound
in the system and 4x2.2 Ghz
processor cores. untill I need to cluster thats what I have to play with for
nutch.
in case you guys needed to know what hardware I'm running
Thank you
-Jay Pound
Fromped.com
BTW windows 2000 is not 100% stable with dual core processors. nutch is ok
but cant do too many things
I've got a fast internal dns cache so nutch wont need one, and it did stop a
lot of the errors with nutch host not found-timeout, most isp's dns server
is bogged down allready by client requests, if you dump 1 clients worth
of dns traffic they can break or not return results so I made my own
Doug I also ran into this when I was testing ndfs the system would have to
wait for the namenode to tell the datanodes what data to recieve and which
data to replicate, I'm currently setting up lustre to see how it works, its
at the kernel level that it operates, do you think if the namenode was
how would I setup mapred for smp machines, I understand it will split up big
jobs like indexing or updating the db into a bunch of chunks to be processed
by separate machines, I have machines that are multiple processor machines
that I want to test this with internally, makes sense to utilize the
it much more stable its very close to usable now
as it looks!!
-Jay Pound
PS: Doug I would like to talk with you sometime about this if you have an
opportunity.
PSS: here is a snipit of the -report just if your interested:
[EMAIL PROTECTED] /nutch-ndfs
$ ./bin/nutch org.apache.nutch.fs.TestClient
I'm copying data into the ndfs right now, I've had the server crash (bad mem
timings oops) it was running 2 datanodes and the namenode. It recovered from
a flat out crash perfectly (blue-screen kernel error system beeping windows
2003 64 sucks), I started the datanodes first then the namenode and
!
sorry about the big e-mails, my brain goes much faster than my fingers!!!
-J
- Original Message -
From: Andrzej Bialecki [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, August 07, 2005 3:00 PM
Subject: Re: ndfs problem needs fix
Jay Pound wrote
I tell luke to look to my index directory for 1 segment, it then tells me
its not a lucene index, I point directly to the l:/segments/2005xxx/index/
does it work properly in windows? very cool tool anyway check it out for
those who havent, I found it on Andrzej's website http://www.getopt.org its
class to make a manual search to see
if your index is flawed in some way, that's a good way to start.
Fredrik
On 8/7/05, Jay Pound [EMAIL PROTECTED] wrote:
I tell luke to look to my index directory for 1 segment, it then tells
me
its not a lucene index, I point directly to the
l:/segments
on a plugin made to do this
-Jay Pound
- Original Message -
From: Chirag Chaman [EMAIL PROTECTED]
To: nutch-user@lucene.apache.org; nutch-dev@lucene.apache.org
Sent: Monday, August 08, 2005 3:02 PM
Subject: RE: regex-url filter
Here's a better way
http://([a-z0-9]*\.)*.(com|org|net|biz
at which step does nutch figure out the weight of each page, the updatedb
step? or the index step?
Thanks,
-Jay
-
From: Jay Pound [EMAIL PROTECTED]
To: nutch-dev@lucene.apache.org
Sent: Thursday, August 11, 2005 4:49 PM
Subject: page ranking weights
at which step does nutch figure out the weight of each page, the updatedb
step? or the index step?
Thanks,
-Jay
14 matches
Mail list logo