Hello Brian,
Getting a response from another newbie here, so I could be wrong (do excuse if
I am).
If you are attempting to run a search index from the filesystem you need to
have the following in your nutch-site.xml :
property
namefs.default.name/name
valuefile:value
it to breadth-first search at the top level, then
depth-first one by one)
Sorry for many questions.
--
View this message in context:
http://www.nabble.com/Newbie-Questions%3A-http.max.delays%2C-view-fetched-page%2C-view-link-db-tp15156228p15156228.html
Sent from the Nutch - User mailing list
;)
--
View this message in context:
http://www.nabble.com/Newbie-Questions%3A-http.max.delays%2C-view-fetched-page%2C-view-link-db-tp15156228p15156228.html
Sent from the Nutch - User mailing list archive at Nabble.com.
--
View this message in context:
http://www.nabble.com/Newbie
PS: polyu.edu.hk . . . greetings to the HK Polytechnic University . . .
(nice semester abroad . . . hehe ;)
--
View this message in context:
http://www.nabble.com/Newbie-Questions%3A-http.max.delays%2C-view-fetched-page%2C-view-link-db-tp15156228p15156228.html
Sent from the Nutch - User
breath-first.
Sorry for many questions.
HTH,
Martin
PS: polyu.edu.hk . . . greetings to the HK Polytechnic University . . .
(nice semester abroad . . . hehe ;)
--
View this message in context:
http://www.nabble.com/Newbie-Questions%3A-http.max.delays%2C-view-fetched-page%2C-view-link-db
.
Sorry for many questions.
HTH,
Martin
PS: polyu.edu.hk . . . greetings to the HK Polytechnic University . . .
(nice semester abroad . . . hehe ;)
--
View this message in context:
http://www.nabble.com/Newbie-Questions%3A-http.max.delays%2C-view-fetched-page%2C-view
I have some newbie questions.
- There are two filters crawl-urlfilter.txt and regex-urlfilter.txt.
Which one should be configured in which condition?
- Is it possible to see howmuch bandwidth Nutch crawl consumes?
- Can the Nutch bot do NTLM authentication for websites in a domain
Hi Guys,
I have few questions:
1- I found that we have the lib lib-lucene-analyzers in the plugin folder.
How does it works, should i just add the definition lib-lucene-analyzers
in the list of plugins in nutch-site.xml or should I also add
language-identifier, analysis-(fr|de|en) ?
2- How do
Hi all,
I started experimenting with Nutch using the NutchTutorial. I got a
succesful crawl to work using the command 'bin/nutch crawl urls -dir
crawl' (no limitations on depth or number of documents). I noticed
that Nutch finishes quite fast. When I looked in the source-html of
the main page
Sir:
On 08/03/07, Jeroen Verhagen [EMAIL PROTECTED] wrote:
Surely these links look ordinary enough to be seen and followed by
nutch? Could someone please tell me what could be causing these links
not be followed?
conf/urlfilter.txt.template contains the line:
[EMAIL PROTECTED]
Remove the '?'
exactly what I was going to say!
Cheers
Paul
On 3/8/07, Hasan Diwan [EMAIL PROTECTED] wrote:
Sir:
On 08/03/07, Jeroen Verhagen [EMAIL PROTECTED] wrote:
Surely these links look ordinary enough to be seen and followed by
nutch? Could someone please tell me what could be causing these links
Hi Hasan,
On 3/8/07, Hasan Diwan [EMAIL PROTECTED] wrote:
conf/urlfilter.txt.template contains the line:
[EMAIL PROTECTED]
Remove the '?' and the links will be followed.
Thanks, that made it work.
I had to comment out the whole line '[EMAIL PROTECTED]' to make it work though
? Even though
Hi,
With your configuration of nutch, the crawl dont take the link with dynamic
parameter.
you must edit your regex filter at this line:
# skip URLs containing certain characters as probable queries, etc.
[EMAIL PROTECTED]
--
View this message in context:
http://www.nabble.com/Newbie
Hi Vacuum
I hope nutch wiki will help you much:)
http://wiki.apache.org/nutch/
Regards
/Jack
On 7/6/05, Vacuum Joe [EMAIL PROTECTED] wrote:
Hello Nutch-gurus,
I have some very straightforward and yet totally
newbie questions which I hope some kind person would
answer.
First of all
I hope nutch wiki will help you much:)
http://wiki.apache.org/nutch/
Hello Jack,
Yes, I have been reading it. The db file contains a
database of all the link structure and pages of the
web. But what is a segment in this case? I assume a
segment contains page content? And then there is the
15 matches
Mail list logo