Jheald added a comment.
@bert Interesting proposal. But for me it raises some issues. Firstly, about the business case for it. Secondly, regarding implementation. Business case first. At the moment we store this information within the Commons MapWarper app. Our reasons should be clarified for why this is unsatisfactory or suboptimal, and what the new structure would aim to achieve over what exists at present. More visibility, more transparency, more obvious accessibility might be things that would be on that list. Are there other ways that we hope what we would build would improve on what presently exists? Secondly, implementation. If we want to store the information in a WikiCommons environment, then it needs to be built on two things: Commons structured-data statements, and Commons data objects. Currently two forms of Commons data objects are defined: tabular data objects ( https://www.mediawiki.org/wiki/Help:Tabular_Data ) and shapefile objects ( https://www.mediawiki.org/wiki/Help:Map_Data ); but additional formats could possibly be added. A question is: what information should go where. Currently we have a file and then a link to a georeferencer app, so eg: - File:Northern provinces of the United States - drawn and engraved for Thomson's New general atlas, 1817; Hewitt Sc. ... NYPL434391.tiff <https://commons.wikimedia.org/wiki/File:Northern_provinces_of_the_United_States_-_drawn_and_engraved_for_Thomson%27s_New_general_atlas,_1817;_Hewitt_Sc._..._NYPL434391.tiff> -> http://maps.nypl.org/warper/maps/13071#Preview_tab (NYPL MapWarper) - File:Pigot and Co (1842) p2.138 - Map of Lancashire.jpg <https://commons.wikimedia.org/wiki/File:Pigot_and_Co_(1842)_p2.138_-_Map_of_Lancashire.jpg> -> http://britishlibrary.georeferencer.com/id/11020006456 (Klokan version 2) - File:Larousse, Plan de Paris, 1900 - David Rumsey.jpg <https://commons.wikimedia.org/wiki/File:Larousse,_Plan_de_Paris,_1900_-_David_Rumsey.jpg> -> https://davidrumsey.georeferencer.com/maps/553129769171/view (Klokan version 4) - File:1768 Jeffreys Wall Map of India and Ceylon - Geographicus - India-jeffreys-1768.jpg <https://commons.wikimedia.org/wiki/File:1768_Jeffreys_Wall_Map_of_India_and_Ceylon_-_Geographicus_-_India-jeffreys-1768.jpg> -> https://warper.wmflabs.org/maps/1998#Preview_tab (Commons MapWarper) More maps with georeferencing can be found in sub-categories of https://commons.wikimedia.org/wiki/Category:Maps_with_georeferencing Bert's proposal sounds like a suggestion for an additional type of Commons data object, with a specified JSON structure. This would likely require edits to the MediaWiki code itself, which might take some time coming; and would it necessarily be storing the data where we wanted it? For maximum visibility, and accessibility through SPARQL queries, an alternative approach would be to store much of the georeferencing metadata as structured-data (SDC) statements directly on the metadata page for the file. To group everything together, one would probably want to have a single master-statement, with further information added as qualifiers. A couple of options suggest themselves for the master-statement. One might be for it to give a link to a geo-rectified version of the map, stored statically as an image on Commons. Qualifiers would then be used to state the mask and other parameters used to generate the transformation. Multiple master-statements could be used to link to different re-projections of different parts of the map. Alternatively, it might be more flexible to make the master-statement a definition of a particular part of the map, with the link to a geo-rectified version then one of the qualifiers. This might fit better with syntax to annotate particular regions of an image -- so stage 1 of a process might be to say that part of an image depicted say Orkney and was an inset map, stage 2 might be to identify a detailed mask or outline to that sub-part of the image, stage 3 might be to add georeferencing to it; potentially with several months separating each stage. The preferred data-model for annotating part of an image in SDC hasn't really been thrashed out. But it may be that the top-level master statement would be //what// that region of the image depicts, with a qualifier saying //where// in the image it is (perhaps by box, perhaps by mask), and then further qualifiers specifying metadata about the georeferencing. One limitation of SDC is that currently one can't have a qualifier on a qualifier, so if one wanted to note additional things about any of the qualifers one couldn't. To some extent it may be possible to work round this, but this may be a limitation of the SDC model that will ultimately have to be re-visited. The detailed data for control points (the GCPS data) is //not// suitable for storage as structured data statements. From experience with some of the BL georeferencing, some maps can have up to 200 control points or more added. Putting all these in structured data would make the data page unreadable. Instead, the best solution for these might be to store them as a Commons tabulated-data object, with a qualifier on the master statement pointing to the Commons file representing the tabulated data object. Looking at the Klokan georeferencer v.2 data structure (click "view source" on the georectified map page), there are some additional metadata fields there that we may wish to consider. In the GCPS data, Klokan notes the source layer that the point was georeferenced against, and also its zoom level. It also notes the zoom level of the original map. There are scenarios where this information might be useful - for example, if one of the source layers turned out to be rather badly georeferenced, so that points georeferenced against it ought to be re-done; also, perhaps, to track which source layers at which zoom levels are most useful for georeferencing. (Also some sources package together different georeferenced layers at different zoom levels, so it might be a particular layer that was badly georeferenced). This information may not be crucial, but if it is available (and eg all BL crowdsourced data is licensed CC0), then we may wish to represent it. I haven't dug into the Klokan v4 spec in any detail to see whether that additional forms of pointwise data. In terms of global data, the Klokan v2 format also includes the co-ordinate bounding boxes and centre co-ordinates, descriptions of the image source, timestamps, user stamps, versioning info, etc -- all of which we may wish to think about, even if some of this the wiki might store for us for free. Versioning is a potential challenge we would think about, since updates to the georeferencing might change all of: the GCPS block, the georectified image, and summary statements in the original image SDC, To look back to a previous version of the georeferencing, one would need to keep each of these in sync. So statements linking eg from the original image to the GCPS block or the georectified image (and vice-versa in the reverse direction) possibly need to be linking to a specific version of a file, as it was at a particular time, otherwise "undo"s of part of the data may cause real difficulties. This may be something we need to think about -- or perhaps it will be enough if the georeferencer app is aware of the issue, to keep look-backs synchronised. Above I have suggested georectified map as a materialised file in its own right on Commons. This is another thing we may need to think about. Traditionally I think the warped versions of images may have been generated on-the-fly by the warper. But for presentation on wiki pages, or external use in external applications, it may be useful to have the geo-rectified image cached as an actual overlay image that can then be used directly, more easily, by all manner of 3rd party software or add-ons. The question of additional storage space vs additional processing demand may need to be assessed. As an additional consideration, georeferencer apps have typically offered multiple warping options -- eg affine transformation, global polynomial interpolation, or local spline interpolation. One might even want to offer additional options -- eg perhaps the option of an angle-preserving rotation-scaling transformation; or the option of projection estimation and direct inversion, perhaps with the option of additional interpolation on top of that. In such cases should one materialise and offer multiple georectified alternatives? Or just one? Or allow them to be displayed in the georeferencer app, with the user then having to specifically "save" one to change the one preserved? An additional complication, but I still think worth thinking about. A final thought: If we want to be able to produce demos of different data modelling, it will be useful to be able to create new properties for a test instance of Commons SDC at will at Wikimania. Does that mean attaching it to a fully-loaded (at least as regards properties) test instance of Wikidata? That may be something to think about before the event, and not just for this project. (eg if there were workshops to develop different potential SDC modellings of GLAM metadata, they might want to be able to create test-versions of properties, too. Pinging @SandraF_WMF ). TASK DETAIL https://phabricator.wikimedia.org/T227036 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bert, Jheald Cc: Jheald, Orienteerix, Abbe98, SandraF_WMF, Susannaanas, Aklapper, bert, darthmon_wmde, Ferenczy, DannyS712, Nandana, JKSTNK, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, Cparle, Anooprao, GoranSMilovanovic, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, _jensen, rosalieper, Morgankevinj, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
