> What happens when you only inject then generate these urls? Do they get > lost?
Good idea, so I injected and generated the 98048 missing urls: injection : 98048 urls generation : 73930 urls so 24.6% missing urls I also tried with a clean list of 1,000,000 clean .com domains, and I had like 17% missing after generation. > Also, make sure that a normalized version of the same url does not > appear somewhere else. For example, nutch probably normalizes these to > "http://1worldtv.mobi/", if you have this url somewhere else nutch > will naturally only keep one copy. I double checked that, all my urls are really unique and clean (no ending slash or stuff like this). -- View this message in context: http://www.nabble.com/generate-process%3A-20--missing-urls-%21-tf4241854.html#a12091070 Sent from the Nutch - User mailing list archive at Nabble.com.
