Petr is right on par with this one. The purpose of this version 2 for dumps
is to allow protocol-specific incremental updating of the dump, which would
be significantly more difficult in non-binary format.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
www.whizkidztech.com | [email protected]


On Mon, Jul 1, 2013 at 2:54 PM, Petr Onderka <[email protected]> wrote:

> Compressed XML is what the current dumps use and it doesn't work well
> because:
> * it can't be edited
> * it doesn't support seeking
>
> I think the only way to solve this is "obscure" and requires special code
> to read and write.
> (And endianness is not a problem if the specification says which one it
> uses and the implementation sticks to it.)
>
> Theoretically, I could use compressed XML in internal data structures, but
> I think that just combines the disadvantages of both.
>
> So, the size is not the main reason not to use XML, it's just one of the
> reasons.
>
> Petr Onderka
>
>
> On Mon, Jul 1, 2013 at 7:26 PM, <[email protected]> wrote:
>
> > On 07/01/2013 12:48:11 PM, Petr Onderka - [email protected] wrote:
> >
> >> >
> >> > What is the intended format of the dump files? The page makes it sound
> >> like
> >> > it will have a binary format, which I'm not opposed to, but is
> >> definitely
> >> > something you should decide on.
> >> >
> >>
> >> Yes, it is a binary format, I will make that clearer on the page.
> >>
> >> The advantage of a binary format is that it's smaller, which I think is
> >> quite important.
> >>
> >
> > In my experience binary formats have very little to recommend them.
> >
> > They are definitely more obscure. They sometimes suffer from endian
> > problems. They require special code to read and write.
> >
> > In my experience I have found that the notion that they offer an
> advantage
> > by being "smaller" is somewhat misguided.
> >
> > In particular, with XML, there is generally a very high degree of
> > redundancy in the text, far more than in normal writing.
> >
> > The consequence of this regularity is that text based XML often
> compresses
> > very, very well.
> >
> > I remember one particular instance where we were generating 30-50
> > Megabytes of XML a day and needed to send it from the USA to the UK every
> > day, in a situation where our leased data rate was really limiting. We
> were
> > surprised and pleased to discover that zipping the files reduced them to
> > only 1-2 MB. I have been skeptical of claims that binary formats are more
> > efficient on the wire (where it matters most) ever since.
> >
> > I think you should do some experiments versus compressed XML to justify
> > your claimed benefits of using a binary format.
> >
> > Jim
> >
> > <snip>
> >
> > --
> > Jim Laurino
> > [email protected]
> > Please direct any reply to the list.
> > Only mail from the listserver reaches this address.
> >
> >
> > ______________________________**_________________
> > Wikitech-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l>
> >
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to