Actually, what you really seem to want is to make use of iiprop=extmetadata, which is an API that makes use of https://commons.wikimedia.org/wiki/Commons:Machine-readable_data included in the various templates. The MultimediaViewer project also uses this API.
https://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&iilimit=max&iiprop=extmetadata|timestamp|user|comment|url|size|mime&prop=imageinfo|revisions&rvgeneratexml=&rvprop=ids|timestamp|user|comment|content Where this is not accurate, you might have to fix up some templates to make them better machine readable. It's all pretty new, and it's basically a managed web scraper in itself, but it's probably better to have one web scraper, than multiple. DJ On Tue, Jun 3, 2014 at 10:18 AM, james harvey <[email protected]> wrote: > Sorry for the email spam. Worked through it, I think. Not too familiar > with wiki internals. :-) > > This particular page doesn't have the content I'm looking for in it. It > references a template which is used by a few other versions of the same > image, presumably so the data can be stored once and be given consistently. > Not being familiar with wiki internals, that was looking to me like it > wasn't returning the entire page content... But it is, so I'll have to > recognize this situation and pull referenced templates when the information > I need isn't already there. > > > On Tue, Jun 3, 2014 at 2:45 AM, james harvey <[email protected]> > wrote: > >> I may have stumbled upon it. If I change the API call from >> "titles=File:XYZ.jpg" to "titles=Template:XYZ" (note: dropped the .jpg) >> then it *appears* to get me what I need. >> >> Is this correct, or did I run across a case where it appears to work but >> isn't going to be the right way to go? (Like, I'm not sure if >> "Template:XYZ" directly relates to the Summary information on the >> "File:XYZ.jpg" page, or if it's duplicated data that in this case matches. >> And, I'm confused why the .jpg gets dropped switching "File:" to >> "Template:") >> >> And, will this always get me the full template information, or if someone >> just updates the "Year" portion, would it only return back that part -- >> since the revisions seem to be returning data as much as they can based on >> changes from the previous revision, rather than the answer ignoring any >> other revisions. >> >> On Tue, Jun 3, 2014 at 1:59 AM, james harvey <[email protected]> >> wrote: >> >>> Given a Wikimedia Commons description page URL - such as: >>> https://commons.wikimedia.org/wiki/File:Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg >>> >>> I would like to be able to programmatically retrieve the information in >>> the "Summary" header. (Values for "Artist", "Title", "Date", "Medium", >>> "Dimensions", "Current location", etc.) >>> >>> I believe all this information is in "Template:Artwork". I can't figure >>> out how to get the wikitext/json-looking template data. >>> >>> If I use the API and call: >>> https://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&iilimit=max&iiprop=timestamp|user|comment|url|size|mime&prop=imageinfo|revisions&rvgeneratexml=&rvprop=ids|timestamp|user|comment|content >>> <https://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&iilimit=max&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Cmime&prop=imageinfo%7Crevisions&rvgeneratexml=&rvprop=ids%7Ctimestamp%7Cuser%7Ccomment%7Ccontent> >>> >>> Then I don't get the information I'm looking for. This shows the most >>> recent revision, and its changes. Unless the most recent revision changed >>> this data, it doesn't show up. >>> >>> To see all the information I'm looking for, it seems I'd have to specify >>> rvlimit=max and go through all the past revisions to figure out which is >>> most current. For example, if I do so and I look at revid 79665032, that >>> includes: "{{Artwork | Artist = {{Creator:Vincent van Gogh}} | . . . | Year >>> = 1889 | Technique = {{Oil on canvas}} | . . ." >>> >>> Isn't there a way to get the current version in whatever format you'd >>> call that - the wikitext/json looking format? >>> >>> In my API call, I can specify rvexpandtemplates which even with only the >>> most recent revision gives me the information I need, but it's largely in >>> HTML tables/divs/etc format rather than wikitext/json/xml/etc. >>> >> >> > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
