Thanks, that sounds like a good solution.

On Wed, Jul 3, 2013 at 3:04 PM, Petr Onderka <[email protected]> wrote:

> A reply to all those who basically want to keep the current XML dumps:
>
> I have decided to change the primary way of reading the dumps: it will now
> be a command line application that outputs the data as uncompressed XML, in
> the same format as current dumps.
>
> This way, you should be able to use the new dumps with minimal changes to
> your code.
>
> Keeping the dumps in a text-based format doesn't make sense, because that
> can't be updated efficiently, which is the whole reason for the new dumps.
>
> Petr Onderka
>
>
> On Mon, Jul 1, 2013 at 11:10 PM, Byrial Jensen <[email protected]
> >wrote:
>
> > Hi,
> >
> > As a regular of user of dump files I would not want a "fancy" file format
> > with indexes stored as trees etc.
> >
> > I parse all the dump files (both for SQL tables and the XML files) with a
> > one pass parser which inserts the data I want (which sometimes is only a
> > small fraction of the total amount of data in the file) into my local
> > database. I will normally never store uncompressed dump files, but pipe
> the
> > uncompressed data directly from bunzip or gunzip to my parser to save
> disk
> > space. Therefore it is important to me that the format is simple enough
> for
> > a one pass parser.
> >
> > I cannot really imagine who would use a library with object oriented API
> > to read dump files. No matter what it would be inefficient and have fewer
> > features and possibilities than using a real database.
> >
> > I could live with a binary format, but I have doubts if it is a good
> idea.
> > It will be harder to take sure that your parser is working correctly, and
> > you have to consider things like endianness, size of integers, format of
> > floats etc. which give no problems in text formats. The binary files may
> be
> > smaller uncompressed (which I don't store anyway) but not necessary when
> > compressed, as the compression will do better on text files.
> >
> > Regards,
> > - Byrial
> >
> >
> > ______________________________**_________________
> > Xmldatadumps-l mailing list
> > Xmldatadumps-l@lists.**wikimedia.org <[email protected]
> >
> > https://lists.wikimedia.org/**mailman/listinfo/xmldatadumps-**l<
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l>
> >
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to