Doug Cutting wrote:
http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.html
well, I think incrediBILL has an argument, that people might really
start excluding bots from their servers if it's
becoming too much. What might help is that incrediBILL would offer an
index
I using dfs. My index contain 3706249 documents. Presently, searching for
occupies from 2 before 4 seconds (I test on query with 3 search term).
Tomcat started on box with cpu Dual Opteron 2.4 GHz and 16 GB Ram. I think
search is very slow now.
We can make search faster?
What factors influence
Hi,
DFS is too slow for the search.
What we did, was extracted the segments to the local FS i.e. to the hard
disk. Each machine has 2X300GB HD in raid.
Bin/hadoop dfs -get index /nutch/index
Bin/hadoop dfs -get linkdb /nutch/linkdb
Bin/hadoop dfs -get segments /nutch/segments
When we run out
In my company we changed the default and many other probably did the same.
However, we must not ignore the behavior of the irresponsible users of
Nutch. And for that reason the use of the default must be blocked in code.
Just my 2 cents.
-Original Message-
From: Michael Wechner
[ http://issues.apache.org/jira/browse/NUTCH-306?page=all ]
Sami Siren reassigned NUTCH-306:
Assign To: Sami Siren
DistributedSearch.Client liveAddresses concurrency problem
--
Key:
[ http://issues.apache.org/jira/browse/NUTCH-122?page=all ]
Sami Siren resolved NUTCH-122:
--
Resolution: Invalid
this is more related to hadoop
block numbers need a better random number generator
---
[ http://issues.apache.org/jira/browse/NUTCH-187?page=all ]
Sami Siren closed NUTCH-187:
Resolution: Won't Fix
closed as requested
Cannot start Nutch datanodes on Windows outside of a cygwin environment
because of DF
I think that Nutch has to solve the problem: if you leave the problem to the
websites, they're more likely to cut you off than they are to implement
their own index storage scheme. Besides, they'd get it wrong, have stale
data, etc.
Maybe what is needed is brainstorming on a shared crawling
[
http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12416379 ]
Chris A. Mattmann commented on NUTCH-258:
-
Thanks for this patch Chris - even if now it is outdate by NUTCH-303 :-(
Since Nutch no more use the deprecated Hadoop