https://bugzilla.wikimedia.org/show_bug.cgi?id=27114
--- Comment #3 from Adam Wight <s...@ludd.net> 2011-02-24 22:27:31 UTC --- Well, I happen to agree with you that multiple files are easier to deal with, but the trend seems to be towards the single, huge file. Modern file transfer and storage makes the two approaches close to equivalent. I am in neither camp. The header and footer could be created as isolated bz2 chunks at a cost of only a few bytes. Then they will be easy to verify and strip back off without codec. Unfortunately, php's bzflush() is a NOP and does not call the existing bzlib flush, but you could close and reopen the file... It seems valuable to preserve the metadata of each job output (see bug #26499), so assuming the pages are organized under a root job-segment element, there is really no header to strip off but the "<?xml version" cruft. Here's an interesting, if irrelevant, recommendation for a new "xml fragment" representation, http://www.w3.org/TR/xml-fragment Note also section C.3, where they discuss how fragments could be used to index into a huge document in order to minimize parsing. (yes, i am axe-grinding for bug #27618 !) -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l