Re: Nutch slow how to speed up?

Sami Siren Tue, 24 Oct 2006 10:42:26 -0700

If your data to be searched lies in dfs it is slow. You need to firstcopy it out to local file system. Split your data into smaller sliceswhich you then distribute evenly on your search nodes.

This part of process is not that well covered and I am looking for muchimprovement in this area from this proposal:


http://mail-archives.apache.org/mod_mbox/lucene-general/200610.mbox/[EMAIL 
PROTECTED]

--
 Sami Siren



Håvard W. Kongsgård wrote:

DistributedSearch
2x datanodes, 2x Task Trackers

Sami Siren wrote:

You are using DistributedSearch? and local filesystem to store indexand related data?


--
 Sami Siren


Håvard W. Kongsgård wrote:

I have nutch 0.8.1 running on 3 servers (AMD X2 3800 with 4 000memory), searching with queries like 'China Nuclear Forces' takes 20– 25 s.


My config:
http.content.limit = 6165536
dfs.replication = 1
mapred.submit.replication = 2
mapred.child.java.opts = -Xmx800m

My data:
TOTAL urls: 3748140
retry 0: 3614731
retry 1: 85999
retry 2: 20772
retry 3: 26638
min score: 0.0
avg score: 0.64956105
max score: 3922.723
status 1 (DB_unfetched): 1316016
status 2 (DB_fetched): 2168397
status 3 (DB_gone): 263727

Status: HEALTHY
Total size: 254534723272 B
Total blocks: 5140 (avg. block size 49520374 B)
Total dirs: 260
Total files: 1466
Over-replicated blocks: 8 (0.15564202 %)
Under-replicated blocks: 0 (0.0 %)
Target replication factor: 1
Real replication factor: 1.0015564

The filesystem under path '/' is HEALTHY

Re: Nutch slow how to speed up?

Reply via email to