2018-09-16 22:03 GMT+02:00 Bináris <[email protected]>:

> The bot scanned the latest huwiki dump for 14 hours(!). (Not the whole
> dump, I used -xmlstart.) It went through 820 thousand pages and found 240+
> matches (I displayed every 10th match).
> Then the bot worked further 30-40 minutes to check the actual pages from
> live wiki, this time with namespace filtering on. (I don't replace in this
> phase, just save the list, so no human interaction is implied in this time.)
> Guess the result! 62 out of 240 remained. This means that the bigger part
> of these 14 hours went into /dev/null.
> Now I realize how much time I wasted in the past 10 years. :-(
>

I was not quite right. With the modified code it took 12 hours instead of
14, 630,000 pages were scanned instead of 820,000 and 83 matches found
instead of 240+ (of which 62 are real). Bt this is still not the same.
_______________________________________________
pywikibot mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot

Reply via email to