Re: page ranking weights

2005-08-12 Thread Jay Pound
- From: Jay Pound [EMAIL PROTECTED] To: nutch-dev@lucene.apache.org Sent: Thursday, August 11, 2005 4:49 PM Subject: page ranking weights at which step does nutch figure out the weight of each page, the updatedb step? or the index step? Thanks, -Jay

page ranking weights

2005-08-11 Thread Jay Pound
at which step does nutch figure out the weight of each page, the updatedb step? or the index step? Thanks, -Jay

Re: luke??

2005-08-08 Thread Jay Pound
class to make a manual search to see if your index is flawed in some way, that's a good way to start. Fredrik On 8/7/05, Jay Pound [EMAIL PROTECTED] wrote: I tell luke to look to my index directory for 1 segment, it then tells me its not a lucene index, I point directly to the l:/segments

Re: regex-url filter

2005-08-08 Thread Jay Pound
on a plugin made to do this -Jay Pound - Original Message - From: Chirag Chaman [EMAIL PROTECTED] To: nutch-user@lucene.apache.org; nutch-dev@lucene.apache.org Sent: Monday, August 08, 2005 3:02 PM Subject: RE: regex-url filter Here's a better way http://([a-z0-9]*\.)*.(com|org|net|biz

Re: ndfs problem needs fix

2005-08-07 Thread Jay Pound
! sorry about the big e-mails, my brain goes much faster than my fingers!!! -J - Original Message - From: Andrzej Bialecki [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, August 07, 2005 3:00 PM Subject: Re: ndfs problem needs fix Jay Pound wrote

luke??

2005-08-07 Thread Jay Pound
I tell luke to look to my index directory for 1 segment, it then tells me its not a lucene index, I point directly to the l:/segments/2005xxx/index/ does it work properly in windows? very cool tool anyway check it out for those who havent, I found it on Andrzej's website http://www.getopt.org its

mapred question

2005-08-06 Thread Jay Pound
how would I setup mapred for smp machines, I understand it will split up big jobs like indexing or updating the db into a bunch of chunks to be processed by separate machines, I have machines that are multiple processor machines that I want to test this with internally, makes sense to utilize the

NDFS benchmark results

2005-08-06 Thread Jay Pound
it much more stable its very close to usable now as it looks!! -Jay Pound PS: Doug I would like to talk with you sometime about this if you have an opportunity. PSS: here is a snipit of the -report just if your interested: [EMAIL PROTECTED] /nutch-ndfs $ ./bin/nutch org.apache.nutch.fs.TestClient

ndfs problem needs fix

2005-08-06 Thread Jay Pound
I'm copying data into the ndfs right now, I've had the server crash (bad mem timings oops) it was running 2 datanodes and the namenode. It recovered from a flat out crash perfectly (blue-screen kernel error system beeping windows 2003 64 sucks), I started the datanodes first then the namenode and

Re: near-term plan

2005-08-04 Thread Jay Pound
Doug I also ran into this when I was testing ndfs the system would have to wait for the namenode to tell the datanodes what data to recieve and which data to replicate, I'm currently setting up lustre to see how it works, its at the kernel level that it operates, do you think if the namenode was

Re: dns lookup cache?

2005-08-03 Thread Jay Pound
I've got a fast internal dns cache so nutch wont need one, and it did stop a lot of the errors with nutch host not found-timeout, most isp's dns server is bogged down allready by client requests, if you dump 1 clients worth of dns traffic they can break or not return results so I made my own

Re: Fetcher delays - benchmarks

2005-08-02 Thread Jay Pound
I'm able to easily saturate my 10mbit connx, but it takes a powerful computer, if your computer is not so powerful try to fetch with the -noParsing flag, it will offload the parsing processing untill later, even a quad pentium 3 xeon 700mhz with 4gb of ram can only saturate about 5mbit, I've used

Memory usage

2005-08-02 Thread Jay Pound
requirements across multiple machines, or is there another servlet program (like resin) that will require less memory to operate, has anyone else run into this? Thanks, -Jay Pound

Re: Memory usage2

2005-08-02 Thread Jay Pound
in the system and 4x2.2 Ghz processor cores. untill I need to cluster thats what I have to play with for nutch. in case you guys needed to know what hardware I'm running Thank you -Jay Pound Fromped.com BTW windows 2000 is not 100% stable with dual core processors. nutch is ok but cant do too many things