Hi, running Nutch-1.2 on a single machine I produce a 2GB index (157 segments, slice 50.000). Because the performance is low, I would like to test further crawls in Amazon EC2. Q1: If I start with 4 nodes, should I divide the number of the segments proportionally for each node, and then start a new crawl? Q2: Analysing the actual log files (single machine), I found out that the most time consuming are Index Database (13 hours), Actualize Crawl Database (11 hours), Actualize LinkDB (7 hours 24), etc. Assuming the 4 node structure, do the index and database actualization time proportionally decrease?
Thanks Patricio

