I've had problems trying to access truncated segments in the past. The process would hang when I tried to read the segment. Have you tried using the segread tool to see if it can be accessed correctly? Have you tried reparing the segment? One week for 4 million records is way long, so I would say that there's something not right.
I don't think there's an optimum segment size, although I find that smaller segments are easier to manage. Also, with multiple segments you can parallelize the indexing process by indexing on separate machines, or even running multiple indexing processes on the same machine. This helps especially if you're using resource intensive index plugins (like language ID). On 4/13/05, Luke Baker <[EMAIL PROTECTED]> wrote: > Hey, > > Is there some sort of optimal or maximum segment size? I have a segment > with 3.9 million records and it appears to be taking a really long time > to index. The index process has been optimizing the index for over a > week. The server I'm running it on is a dual Xeon 3.0 Ghz with 2GB of > RAM. I've done 2 million page segments before and the optimizing has > taken about 48 hours. > > Would a truncated segment cause the optimizing process to take a really > long time? I would guess that the optimizing process would just be > manipulating the index that already has been created and that nothing in > the segment itself would cause the optimizing part to take a really long > time. > > I have confirmed that the process is still running and modifying files > in the index directory. Would the underlying filesystem play any role > in all this? I'm using ext3. > > Thanks, > > Luke > ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
