Thanks, that sounds like a good solution.
On Wed, Jul 3, 2013 at 3:04 PM, Petr Onderka <[email protected]> wrote: > A reply to all those who basically want to keep the current XML dumps: > > I have decided to change the primary way of reading the dumps: it will now > be a command line application that outputs the data as uncompressed XML, in > the same format as current dumps. > > This way, you should be able to use the new dumps with minimal changes to > your code. > > Keeping the dumps in a text-based format doesn't make sense, because that > can't be updated efficiently, which is the whole reason for the new dumps. > > Petr Onderka > > > On Mon, Jul 1, 2013 at 11:10 PM, Byrial Jensen <[email protected] > >wrote: > > > Hi, > > > > As a regular of user of dump files I would not want a "fancy" file format > > with indexes stored as trees etc. > > > > I parse all the dump files (both for SQL tables and the XML files) with a > > one pass parser which inserts the data I want (which sometimes is only a > > small fraction of the total amount of data in the file) into my local > > database. I will normally never store uncompressed dump files, but pipe > the > > uncompressed data directly from bunzip or gunzip to my parser to save > disk > > space. Therefore it is important to me that the format is simple enough > for > > a one pass parser. > > > > I cannot really imagine who would use a library with object oriented API > > to read dump files. No matter what it would be inefficient and have fewer > > features and possibilities than using a real database. > > > > I could live with a binary format, but I have doubts if it is a good > idea. > > It will be harder to take sure that your parser is working correctly, and > > you have to consider things like endianness, size of integers, format of > > floats etc. which give no problems in text formats. The binary files may > be > > smaller uncompressed (which I don't store anyway) but not necessary when > > compressed, as the compression will do better on text files. > > > > Regards, > > - Byrial > > > > > > ______________________________**_________________ > > Xmldatadumps-l mailing list > > Xmldatadumps-l@lists.**wikimedia.org <[email protected] > > > > https://lists.wikimedia.org/**mailman/listinfo/xmldatadumps-**l< > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l> > > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
