While parsing wiki code without specific python tools, I found a major problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.
Alex brollo 2014-06-08 23:47 GMT+02:00 Merlijn van Deen <[email protected]>: > On 1 June 2014 01:57, Ricordisamoa <[email protected]> wrote: > >> Since gerrit:131263 <https://gerrit.wikimedia.org/r/131263/> , it seems >> to me that the excellent mwpfh is going to be used more and more >> extensively within our framework. >> Am I right? For example, the DuplicateReferences detection and fix in >> reflinks.py could be brightly refactored without regular expressions. >> Or are we supposed to do the opposite conversion, where possible? >> > > My preference is to depend on mwpfh where possible - their parser support > is much better than ours, and it makes much more sense to concentrate > efforts in one place. However, there's one blocker for this: the Windows > support of wmpfh. It uses a C extension, and it's hard to build C > extensions under Windows -- so we'd need to help Windows users along > installing it in some way. I've updated the issue at > https://github.com/earwig/mwparserfromhell/issues/68 with some notes for > that. > > Merlijn > > > _______________________________________________ > Pywikipedia-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l > >
_______________________________________________ Pywikipedia-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
