https://bugzilla.wikimedia.org/show_bug.cgi?id=47407
--- Comment #2 from Kiran Mathew Koshy <[email protected]> --- I have implemented a primitive version of the above tool... https://github.com/kiranmathewkoshy/zimcheck/ It implements the following checks: 1- Internal checkSum 2- Verify that there are no online dependencies 3- Check for all metadata entries 4- Verify favicon.png 5- Main Page Header. 6- Duplicate content. Although search for Duplicate content was initially slow on large files, I have managed to speed it up to run in less than 2 minutes on the 2.6 GB wikipedia zim file. However, checking internal URLs is still slow, and being a CPU intensive process, I have decided to go on with dividing the work on a few threads. Also note that the regex library used is a part of C++11, and I'm not aware if the rest of zimlib is compatible with C++11. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
