Re: newbie questions

2009-12-01 Thread Mischa Tuffield
Hello Brian, Getting a response from another newbie here, so I could be wrong (do excuse if I am). If you are attempting to run a search index from the filesystem you need to have the following in your nutch-site.xml : property namefs.default.name/name valuefile:value

Re: Newbie Questions: http.max.delays, view fetched page, view link db

2008-01-29 Thread Martin Kuen
Hi there, On Jan 29, 2008 5:23 PM, Vinci [EMAIL PROTECTED] wrote: Hi, Thank you :) One more question for the fetched page reading: I prefer I can dump the fetched page into a single html file. You could modify the Fetcher class (org.apache.nutch.fetch.Fetcher) to create a seperate file

Re: Newbie Questions: http.max.delays, view fetched page, view link db

2008-01-29 Thread Vinci
Hi, Thank you :) One more question for the fetched page reading: I prefer I can dump the fetched page into a single html file. No other way besides invert the inverted file? Martin Kuen wrote: Hi, On Jan 29, 2008 11:11 AM, Vinci [EMAIL PROTECTED] wrote: Hi, I am new to nutch and I

Re: Newbie Questions: http.max.delays, view fetched page, view link db

2008-01-29 Thread Martin Kuen
Hi, On Jan 29, 2008 11:11 AM, Vinci [EMAIL PROTECTED] wrote: Hi, I am new to nutch and I am trying to run a nutch to fetch something from specific websites. Currently I am running 0.9. As I have limited resources, I don't want nutch be too aggressive, so I want to set some delay, but I

Re: Newbie Questions: http.max.delays, view fetched page, view link db

2008-01-29 Thread Vinci
Hi, thank you.:) Seems I need to write a Java program to write out the file and do the transformation. Another question to the dumped linkdb: I find escaped html appear in the end of the link, is it the fault of the parser (the html most likely not valid, but I really don't need the chunk of the

Re: Newbie questions about followed links

2007-03-08 Thread Hasan Diwan
Sir: On 08/03/07, Jeroen Verhagen [EMAIL PROTECTED] wrote: Surely these links look ordinary enough to be seen and followed by nutch? Could someone please tell me what could be causing these links not be followed? conf/urlfilter.txt.template contains the line: [EMAIL PROTECTED] Remove the '?'

Re: Newbie questions about followed links

2007-03-08 Thread Paul Liddelow
exactly what I was going to say! Cheers Paul On 3/8/07, Hasan Diwan [EMAIL PROTECTED] wrote: Sir: On 08/03/07, Jeroen Verhagen [EMAIL PROTECTED] wrote: Surely these links look ordinary enough to be seen and followed by nutch? Could someone please tell me what could be causing these links

Re: Newbie questions about followed links

2007-03-08 Thread Jeroen Verhagen
Hi Hasan, On 3/8/07, Hasan Diwan [EMAIL PROTECTED] wrote: conf/urlfilter.txt.template contains the line: [EMAIL PROTECTED] Remove the '?' and the links will be followed. Thanks, that made it work. I had to comment out the whole line '[EMAIL PROTECTED]' to make it work though ? Even though

Re: Newbie questions

2005-07-05 Thread Jack Tang
Hi Vacuum I hope nutch wiki will help you much:) http://wiki.apache.org/nutch/ Regards /Jack On 7/6/05, Vacuum Joe [EMAIL PROTECTED] wrote: Hello Nutch-gurus, I have some very straightforward and yet totally newbie questions which I hope some kind person would answer. First of all,

Re: Newbie questions

2005-07-05 Thread Vacuum Joe
I hope nutch wiki will help you much:) http://wiki.apache.org/nutch/ Hello Jack, Yes, I have been reading it. The db file contains a database of all the link structure and pages of the web. But what is a segment in this case? I assume a segment contains page content? And then there is the