On 2010-04-05 16:54, ashokkumar.raveendi...@wipro.com wrote: > Hi, > Thank you for your suggestion. I have around 500+ internet urls > configured for crawling and crawl process is running in Amazon cloud. I > have already reduced my depth to 8, topN to 1000 and also increased > fetcher threads to 150 and limited 50 urls per host using > generate.max.per.host property. With this configuration Generate, Fetch, > Parse, Update completes in max 10 hrs. When comes to segment merge it > takes lot of time. As a temporary solution I am not doing the segment > merge and directly indexing the fetched segments. With this solution I > am able to finish the crawl process with in 24hrs. Now I am looking for > long term solution to optimize segment merge process.
Segment merging is not strictly necessary, unless you have a hundred segments or so. If this step takes too much time, but still the number of segments is well below a hundred, just don't merge them. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com