[Bug 27114] do we really need to recombine stub and page file chunks into single huge files?

bugzilla-daemon Thu, 24 Feb 2011 14:27:37 -0800

https://bugzilla.wikimedia.org/show_bug.cgi?id=27114


--- Comment #3 from Adam Wight <s...@ludd.net> 2011-02-24 22:27:31 UTC ---
Well, I happen to agree with you that multiple files are easier to deal with,
but the trend seems to be towards the single, huge file.  Modern file transfer
and storage makes the two approaches close to equivalent.  I am in neither
camp.

The header and footer could be created as isolated bz2 chunks at a cost of only
a few bytes.  Then they will be easy to verify and strip back off without
codec.  Unfortunately, php's bzflush() is a NOP and does not call the existing
bzlib flush, but you could close and reopen the file...

It seems valuable to preserve the metadata of each job output (see bug #26499),
so assuming the pages are organized under a root job-segment element, there is
really no header to strip off but the "<?xml version" cruft.

Here's an interesting, if irrelevant, recommendation for a new "xml fragment"
representation,
    http://www.w3.org/TR/xml-fragment
Note also section C.3, where they discuss how fragments could be used to index
into a huge document in order to minimize parsing.  (yes, i am axe-grinding for
bug #27618 !)

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 27114] do we really need to recombine stub and page file chunks into single huge files?

Reply via email to