[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-09 Thread Mitar
Hi! I made this ticket [1] to track regaining access to metadata as a dump. [1] https://phabricator.wikimedia.org/T301039 Mitar On Tue, Feb 8, 2022 at 2:32 AM Platonides wrote: > > The metadata used to be included in the image table, but it was changed 6 > months ago out to External

[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-07 Thread Platonides
The metadata used to be included in the image table, but it was changed 6 months ago out to External Storage. See https://phabricator.wikimedia.org/T275268#7178983 On Fri, 4 Feb 2022 at 20:44, Mitar wrote: > Hi! > > Will do. Thanks. > > After going through the image table dump, it seems not

[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-05 Thread Ariel Glenn WMF
The text table itself is not dumped, because some entries in it may be related to hidden revisions or deleted pages, and thus not viewable by ordinary users. The text id is given in the content dumps as an xml tag before the wrapped wikitext content, and you can associate the items that way.

[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-04 Thread Mitar
Hi! Will do. Thanks. After going through the image table dump, it seems not all data is in there. For example, page count for Djvu files is missing. Instead of metadata in the image table dump, a reference to text table [1] is provided:

[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-03 Thread Ariel Glenn WMF
This looks great! If you like, you might add the link and a brief description to this page: https://meta.wikimedia.org/wiki/Data_dumps/Other_tools so that more people can find and use the library :-) (Anyone else have tools they wrote and use, that aren't on this list? Please add them!) Ariel

[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-03 Thread Mitar
Hi! If it is useful to anyone else, I have added to my library [1] in Go for processing dumps support for processing SQL dumps directly, without having to load them into a database. So one can process them directly to extract data, like dumps in other formats. [1]

[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-03 Thread Mitar
Hi! I see. Thanks. Mitar On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF wrote: > > The media/file descriptions contained in the dump are the wikitext of the > revisions of pages with the File: prefix, plus the metadata about those pages > and revisions (user that made the edit, timestamp of

[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-02 Thread Ariel Glenn WMF
The media/file descriptions contained in the dump are the wikitext of the revisions of pages with the File: prefix, plus the metadata about those pages and revisions (user that made the edit, timestamp of edit, edit comment, and so on). Width and hieght of the image, the media type, the sha1 of