FYI, i've build a 111 million page corpus on the latest & greatest code (w/lucene 1.4). Hopefully the scp'ing of indices to servers will be complete by 10:00 pm EST so you should be able to run queries then and see the updates results.
Documents have been refreshed for the most part 3 times, so the scoring should be better than the current index. http://www.mozdex.com I'll reply to this message once completed, but i thought i would let people know nutch/lucene has worked great thus far to build this index and our next goal will be 250 million urls :) ------------------------------------------------------- This SF.Net email sponsored by Black Hat Briefings & Training. Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
