2018-09-16 22:03 GMT+02:00 Bináris <[email protected]>: > The bot scanned the latest huwiki dump for 14 hours(!). (Not the whole > dump, I used -xmlstart.) It went through 820 thousand pages and found 240+ > matches (I displayed every 10th match). > Then the bot worked further 30-40 minutes to check the actual pages from > live wiki, this time with namespace filtering on. (I don't replace in this > phase, just save the list, so no human interaction is implied in this time.) > Guess the result! 62 out of 240 remained. This means that the bigger part > of these 14 hours went into /dev/null. > Now I realize how much time I wasted in the past 10 years. :-( >
I was not quite right. With the modified code it took 12 hours instead of 14, 630,000 pages were scanned instead of 820,000 and 83 matches found instead of 240+ (of which 62 are real). Bt this is still not the same.
_______________________________________________ pywikibot mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot
