Re: newbie questions

2009-12-01 Thread Mischa Tuffield
Hello Brian, Getting a response from another newbie here, so I could be wrong (do excuse if I am). If you are attempting to run a search index from the filesystem you need to have the following in your nutch-site.xml : property namefs.default.name/name valuefile:value

Newbie Questions: http.max.delays, view fetched page, view link db

2008-01-29 Thread Vinci
it to breadth-first search at the top level, then depth-first one by one) Sorry for many questions. -- View this message in context: http://www.nabble.com/Newbie-Questions%3A-http.max.delays%2C-view-fetched-page%2C-view-link-db-tp15156228p15156228.html Sent from the Nutch - User mailing list

Re: Newbie Questions: http.max.delays, view fetched page, view link db

2008-01-29 Thread Martin Kuen
;) -- View this message in context: http://www.nabble.com/Newbie-Questions%3A-http.max.delays%2C-view-fetched-page%2C-view-link-db-tp15156228p15156228.html Sent from the Nutch - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Newbie

Re: Newbie Questions: http.max.delays, view fetched page, view link db

2008-01-29 Thread Vinci
PS: polyu.edu.hk . . . greetings to the HK Polytechnic University . . . (nice semester abroad . . . hehe ;) -- View this message in context: http://www.nabble.com/Newbie-Questions%3A-http.max.delays%2C-view-fetched-page%2C-view-link-db-tp15156228p15156228.html Sent from the Nutch - User

Re: Newbie Questions: http.max.delays, view fetched page, view link db

2008-01-29 Thread Martin Kuen
breath-first. Sorry for many questions. HTH, Martin PS: polyu.edu.hk . . . greetings to the HK Polytechnic University . . . (nice semester abroad . . . hehe ;) -- View this message in context: http://www.nabble.com/Newbie-Questions%3A-http.max.delays%2C-view-fetched-page%2C-view-link-db

Re: Newbie Questions: http.max.delays, view fetched page, view link db

2008-01-29 Thread Vinci
. Sorry for many questions. HTH, Martin PS: polyu.edu.hk . . . greetings to the HK Polytechnic University . . . (nice semester abroad . . . hehe ;) -- View this message in context: http://www.nabble.com/Newbie-Questions%3A-http.max.delays%2C-view-fetched-page%2C-view

Newbie questions about filter, bandwidth, NTLM and threads

2007-09-20 Thread Bent Hugh
I have some newbie questions. - There are two filters crawl-urlfilter.txt and regex-urlfilter.txt. Which one should be configured in which condition? - Is it possible to see howmuch bandwidth Nutch crawl consumes? - Can the Nutch bot do NTLM authentication for websites in a domain

Code Newbie questions

2007-07-04 Thread Emmanuel JOKE
Hi Guys, I have few questions: 1- I found that we have the lib lib-lucene-analyzers in the plugin folder. How does it works, should i just add the definition lib-lucene-analyzers in the list of plugins in nutch-site.xml or should I also add language-identifier, analysis-(fr|de|en) ? 2- How do

Newbie questions about followed links

2007-03-08 Thread Jeroen Verhagen
Hi all, I started experimenting with Nutch using the NutchTutorial. I got a succesful crawl to work using the command 'bin/nutch crawl urls -dir crawl' (no limitations on depth or number of documents). I noticed that Nutch finishes quite fast. When I looked in the source-html of the main page

Re: Newbie questions about followed links

2007-03-08 Thread Hasan Diwan
Sir: On 08/03/07, Jeroen Verhagen [EMAIL PROTECTED] wrote: Surely these links look ordinary enough to be seen and followed by nutch? Could someone please tell me what could be causing these links not be followed? conf/urlfilter.txt.template contains the line: [EMAIL PROTECTED] Remove the '?'

Re: Newbie questions about followed links

2007-03-08 Thread Paul Liddelow
exactly what I was going to say! Cheers Paul On 3/8/07, Hasan Diwan [EMAIL PROTECTED] wrote: Sir: On 08/03/07, Jeroen Verhagen [EMAIL PROTECTED] wrote: Surely these links look ordinary enough to be seen and followed by nutch? Could someone please tell me what could be causing these links

Re: Newbie questions about followed links

2007-03-08 Thread Jeroen Verhagen
Hi Hasan, On 3/8/07, Hasan Diwan [EMAIL PROTECTED] wrote: conf/urlfilter.txt.template contains the line: [EMAIL PROTECTED] Remove the '?' and the links will be followed. Thanks, that made it work. I had to comment out the whole line '[EMAIL PROTECTED]' to make it work though ? Even though

Re: [SOLVED] Newbie questions about followed links

2007-03-08 Thread djames
Hi, With your configuration of nutch, the crawl dont take the link with dynamic parameter. you must edit your regex filter at this line: # skip URLs containing certain characters as probable queries, etc. [EMAIL PROTECTED] -- View this message in context: http://www.nabble.com/Newbie

Re: Newbie questions

2005-07-05 Thread Jack Tang
Hi Vacuum I hope nutch wiki will help you much:) http://wiki.apache.org/nutch/ Regards /Jack On 7/6/05, Vacuum Joe [EMAIL PROTECTED] wrote: Hello Nutch-gurus, I have some very straightforward and yet totally newbie questions which I hope some kind person would answer. First of all

Re: Newbie questions

2005-07-05 Thread Vacuum Joe
I hope nutch wiki will help you much:) http://wiki.apache.org/nutch/ Hello Jack, Yes, I have been reading it. The db file contains a database of all the link structure and pages of the web. But what is a segment in this case? I assume a segment contains page content? And then there is the