Petr is right on par with this one. The purpose of this version 2 for dumps is to allow protocol-specific incremental updating of the dump, which would be significantly more difficult in non-binary format.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | [email protected] On Mon, Jul 1, 2013 at 2:54 PM, Petr Onderka <[email protected]> wrote: > Compressed XML is what the current dumps use and it doesn't work well > because: > * it can't be edited > * it doesn't support seeking > > I think the only way to solve this is "obscure" and requires special code > to read and write. > (And endianness is not a problem if the specification says which one it > uses and the implementation sticks to it.) > > Theoretically, I could use compressed XML in internal data structures, but > I think that just combines the disadvantages of both. > > So, the size is not the main reason not to use XML, it's just one of the > reasons. > > Petr Onderka > > > On Mon, Jul 1, 2013 at 7:26 PM, <[email protected]> wrote: > > > On 07/01/2013 12:48:11 PM, Petr Onderka - [email protected] wrote: > > > >> > > >> > What is the intended format of the dump files? The page makes it sound > >> like > >> > it will have a binary format, which I'm not opposed to, but is > >> definitely > >> > something you should decide on. > >> > > >> > >> Yes, it is a binary format, I will make that clearer on the page. > >> > >> The advantage of a binary format is that it's smaller, which I think is > >> quite important. > >> > > > > In my experience binary formats have very little to recommend them. > > > > They are definitely more obscure. They sometimes suffer from endian > > problems. They require special code to read and write. > > > > In my experience I have found that the notion that they offer an > advantage > > by being "smaller" is somewhat misguided. > > > > In particular, with XML, there is generally a very high degree of > > redundancy in the text, far more than in normal writing. > > > > The consequence of this regularity is that text based XML often > compresses > > very, very well. > > > > I remember one particular instance where we were generating 30-50 > > Megabytes of XML a day and needed to send it from the USA to the UK every > > day, in a situation where our leased data rate was really limiting. We > were > > surprised and pleased to discover that zipping the files reduced them to > > only 1-2 MB. I have been skeptical of claims that binary formats are more > > efficient on the wire (where it matters most) ever since. > > > > I think you should do some experiments versus compressed XML to justify > > your claimed benefits of using a binary format. > > > > Jim > > > > <snip> > > > > -- > > Jim Laurino > > [email protected] > > Please direct any reply to the list. > > Only mail from the listserver reaches this address. > > > > > > ______________________________**_________________ > > Wikitech-l mailing list > > [email protected] > > https://lists.wikimedia.org/**mailman/listinfo/wikitech-l< > https://lists.wikimedia.org/mailman/listinfo/wikitech-l> > > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
