Xqt triaged this task as "High" priority.
Xqt added a comment.

  I see it is important to filter before process the page
  
  | **XmlDumpReplacePageGenerator **                   | **old 
XMLDumpPageGenerator **          | **new XMLDumpPageGenerator**                 
                         | **-start:An option used**              |
  | filter is made for each dump entry                 | no filtering is made 
before processing | no filtering is made before processing but entry.text is 
not assigned | no filtering is made before processing |
  | 55697 entries processed, 12 pages found to process | 271 pages processed 
until first edit   | 271 pages processed until first edit                       
           | 3625 pages processed until first edit  |
  | 1 second                                           | 194 seconds            
                | 57 seconds                                                    
        | 60 seconds                             |
  |
  
  Unfortunately the old redirect XmlDumpReplacePageGenerator implementation 
raises an exception when parsing whereas the pagegenerators implementation does 
not (maybe because the script was halted after the first 271 pages):
  
    ERROR: ParseError: no element found: line 328465, column 355
    Traceback (most recent call last):
      File "C:\pwb\GIT\core\pwb.py", line 496, in <module>
        main()
      File "C:\pwb\GIT\core\pwb.py", line 480, in main
        if not execute():
      File "C:\pwb\GIT\core\pwb.py", line 463, in execute
        run_python_file(filename, script_args, module)
      File "C:\pwb\GIT\core\pwb.py", line 143, in run_python_file
        exec(compile(source, filename, 'exec', dont_inherit=True),
      File ".\scripts\replace.py", line 1107, in <module>
        main()
      File ".\scripts\replace.py", line 1103, in main
        bot.run()
      File "C:\pwb\GIT\core\pywikibot\bot.py", line 1555, in run
        for item in self.generator:
      File "C:\pwb\GIT\core\pywikibot\pagegenerators.py", line 2240, in 
PreloadingGenerator
        for page in generator:
      File "C:\pwb\GIT\core\pywikibot\pagegenerators.py", line 1761, in 
<genexpr>
        return (page for page in generator if page.namespace() in namespaces)
      File ".\scripts\replace.py", line 435, in __iter__
        for entry in self.parser:
      File "C:\pwb\GIT\core\pywikibot\xmlreader.py", line 119, in parse
        for event, elem in context:
      File "C:\Python310\lib\xml\etree\ElementTree.py", line 1260, in iterator
        root = pullparser._close_and_return_root()
      File "C:\Python310\lib\xml\etree\ElementTree.py", line 1307, in 
_close_and_return_root
        root = self._parser.close()
    xml.etree.ElementTree.ParseError: no element found: line 328465, column 355
    CRITICAL: Exiting due to uncaught exception <class 
'xml.etree.ElementTree.ParseError'>

TASK DETAIL
  https://phabricator.wikimedia.org/T306134

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Xqt
Cc: #pywikibot-replace.py, Aklapper, #pywikibot-pagegenerators.py, 
pywikibot-bugs-list, Basilicofresco, Fernandobacasegua34, 786, Suran38, 
Biggs657, Lalamarie69, Jyoo1011, JohnsonLee01, Juan90264, SHEKH, Dijkstra, 
Alter-paule, Beast1978, Un1tY, Khutuck, Zkhalido, Hook696, Kent7301, 
joker88john, Viztor, CucyNoiD, Wenyi, Gaboe420, Giuliamocci, Cpaulf30, Af420, 
Bsandipan, Tbscho, MayS, Lewizho99, Mdupont, JJMC89, Maathavan, TerraCodes, 
Dvorapa, Altostratus, Neuronton, Avicennasis, mys_721tx, jayvdb, Masti, 
Alchimista
_______________________________________________
pywikibot-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to