Jheald added a comment.

  @bert Interesting proposal.  But for me it raises some issues.  Firstly, 
about the business case for it.  Secondly, regarding implementation.
  
  Business case first.  At the moment we store this information within the 
Commons MapWarper app.  Our reasons should be clarified for why this is 
unsatisfactory or suboptimal, and what the new structure would aim to achieve 
over what exists at present.   More visibility, more transparency, more obvious 
accessibility might be things that would be on that list.  Are there other ways 
that we hope what we would build would improve on what presently exists?
  
  Secondly, implementation.  If we want to store the information in a 
WikiCommons environment, then it needs to be built on two things:  Commons 
structured-data statements, and Commons data objects.  Currently two forms of 
Commons data objects are defined: tabular data objects ( 
https://www.mediawiki.org/wiki/Help:Tabular_Data ) and shapefile objects ( 
https://www.mediawiki.org/wiki/Help:Map_Data ); but additional formats could 
possibly be added.
  
  A question is: what information should go where.
  
  Currently we have a file and then a link to a georeferencer app, so eg:
  
  - File:Northern provinces of the United States - drawn and engraved for 
Thomson's New general atlas, 1817; Hewitt Sc. ... NYPL434391.tiff 
<https://commons.wikimedia.org/wiki/File:Northern_provinces_of_the_United_States_-_drawn_and_engraved_for_Thomson%27s_New_general_atlas,_1817;_Hewitt_Sc._..._NYPL434391.tiff>
 ->  http://maps.nypl.org/warper/maps/13071#Preview_tab  (NYPL MapWarper)
  - File:Pigot and Co (1842) p2.138 - Map of Lancashire.jpg 
<https://commons.wikimedia.org/wiki/File:Pigot_and_Co_(1842)_p2.138_-_Map_of_Lancashire.jpg>
 -> http://britishlibrary.georeferencer.com/id/11020006456   (Klokan version 2)
  - File:Larousse, Plan de Paris, 1900 - David Rumsey.jpg 
<https://commons.wikimedia.org/wiki/File:Larousse,_Plan_de_Paris,_1900_-_David_Rumsey.jpg>
 -> https://davidrumsey.georeferencer.com/maps/553129769171/view  (Klokan 
version 4)
  - File:1768 Jeffreys Wall Map of India and Ceylon - Geographicus - 
India-jeffreys-1768.jpg 
<https://commons.wikimedia.org/wiki/File:1768_Jeffreys_Wall_Map_of_India_and_Ceylon_-_Geographicus_-_India-jeffreys-1768.jpg>
 -> https://warper.wmflabs.org/maps/1998#Preview_tab (Commons MapWarper)
  
  More maps with georeferencing can be found in sub-categories of 
https://commons.wikimedia.org/wiki/Category:Maps_with_georeferencing
  
  Bert's proposal sounds like a suggestion for an additional type of Commons 
data object, with a specified JSON structure.  This would likely require edits 
to the MediaWiki code itself, which might take some time coming; and would it 
necessarily be storing the data where we wanted it?
  
  For maximum visibility, and accessibility through SPARQL queries, an 
alternative approach would be to store much of the georeferencing metadata as 
structured-data (SDC) statements directly on the metadata page for the file.  
To group everything together, one would probably want to have a single 
master-statement, with further information added as qualifiers.   A couple of 
options suggest themselves for the master-statement.  One might be for it to 
give a link to a geo-rectified version of the map, stored statically as an 
image on Commons.  Qualifiers would then be used to state the mask and other 
parameters used to generate the transformation.   Multiple master-statements 
could be used to link to different re-projections of different parts of the map.
  
  Alternatively, it might be more flexible to make the master-statement a 
definition of a particular part of the map, with the link to a geo-rectified 
version then one of the qualifiers.  This might fit better with syntax to 
annotate particular regions of an image -- so stage 1 of a process might be to 
say that part of an image depicted say Orkney and was an inset map, stage 2 
might be to identify a detailed mask or outline to that sub-part of the image, 
stage 3 might be to add georeferencing to it; potentially with several months 
separating each stage.   The preferred data-model for annotating part of an 
image in SDC hasn't really been thrashed out.  But it may be that the top-level 
master statement would be //what// that region of the image depicts, with a 
qualifier saying //where// in the image it is (perhaps by box, perhaps by 
mask), and then further qualifiers specifying metadata about the 
georeferencing.   One limitation of SDC is that currently one can't have a 
qualifier on a qualifier, so if one wanted to note additional things about any 
of the qualifers one couldn't.  To some extent it may be possible to work round 
this, but this may be a limitation of the SDC model that will ultimately have 
to be re-visited.
  
  The detailed data for control points (the GCPS data) is //not// suitable for 
storage as structured data statements.  From experience with some of the BL 
georeferencing, some maps can have up to 200 control points or more added.  
Putting all these in structured data would make the data page unreadable.  
Instead, the best solution for these might be to store them as a Commons 
tabulated-data object, with a qualifier on the master statement pointing to the 
Commons file representing the tabulated data object.
  
  Looking at the Klokan georeferencer v.2 data structure (click "view source" 
on the georectified map page), there are some additional metadata fields there 
that we may wish to consider.  In the GCPS data, Klokan notes the source layer 
that the point was georeferenced against, and also its zoom level.  It also 
notes the zoom level of the original map.  There are scenarios where this 
information might be useful - for example, if one of the source layers turned 
out to be rather badly georeferenced, so that points georeferenced against it 
ought to be re-done; also, perhaps, to track which source layers at which zoom 
levels are most useful for georeferencing.  (Also some sources package together 
different georeferenced layers at different zoom levels, so it might be a 
particular layer that was badly georeferenced).  This information may not be 
crucial, but if it is available (and eg all BL crowdsourced data is licensed 
CC0), then we may wish to represent it.  I haven't dug into the Klokan v4 spec 
in any detail to see whether that additional forms of pointwise data.  In terms 
of global data, the Klokan v2 format also includes the co-ordinate bounding 
boxes and centre co-ordinates, descriptions of the image source, timestamps, 
user stamps, versioning info, etc -- all of which we may wish to think about, 
even if some of this the wiki might store for us for free.
  
  Versioning is a potential challenge we would think about, since updates to 
the georeferencing might change all of: the GCPS block, the georectified image, 
and summary statements in the original image SDC,  To look back to a previous 
version of the georeferencing, one would need to keep each of these in sync.  
So statements linking eg from the original image to the GCPS block or the 
georectified image (and vice-versa in the reverse direction) possibly need to 
be linking to a specific version of a file, as it was at a particular time, 
otherwise "undo"s of part of the data may cause real difficulties.  This may be 
something we need to think about -- or perhaps it will be enough if the 
georeferencer app is aware of the issue, to keep look-backs synchronised.
  
  Above I have suggested georectified map as a materialised file in its own 
right on Commons.  This is another thing we may need to think about.  
Traditionally I think the warped versions of images may have been generated 
on-the-fly by the warper.  But for presentation on wiki pages, or external use 
in external applications, it may be useful to have the geo-rectified image 
cached as an actual overlay image that can then be used directly, more easily, 
by all manner of 3rd party software or add-ons.   The question of additional 
storage space vs additional processing demand may need to be assessed.  As an 
additional consideration, georeferencer apps have typically offered multiple 
warping options -- eg affine transformation, global polynomial interpolation, 
or local spline interpolation.  One might even want to offer additional options 
-- eg perhaps the option of an angle-preserving rotation-scaling 
transformation; or the option of projection estimation and direct inversion, 
perhaps with the option of additional interpolation on top of that.    In such 
cases should one materialise and offer multiple georectified alternatives?  Or 
just one?  Or allow them to be displayed in the georeferencer app, with the 
user then having to specifically "save" one to change the one preserved?  An 
additional complication, but I still think worth thinking about.
  
  A final thought: If we want to be able to produce demos of different data 
modelling, it will be useful to be able to create new properties for a test 
instance of Commons SDC at will at Wikimania.  Does that mean attaching it to a 
fully-loaded (at least as regards properties) test instance of Wikidata?  That 
may be something to think about before the event, and not just for this 
project.  (eg if there were workshops to develop different potential SDC 
modellings of GLAM metadata, they might want to be able to create test-versions 
of properties, too.  Pinging @SandraF_WMF ).

TASK DETAIL
  https://phabricator.wikimedia.org/T227036

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: bert, Jheald
Cc: Jheald, Orienteerix, Abbe98, SandraF_WMF, Susannaanas, Aklapper, bert, 
darthmon_wmde, Ferenczy, DannyS712, Nandana, JKSTNK, Lahi, PDrouin-WMF, Gq86, 
E1presidente, Ramsey-WMF, Cparle, Anooprao, GoranSMilovanovic, QZanden, 
Tramullas, Acer, LawExplorer, Salgo60, Silverfish, _jensen, rosalieper, 
Morgankevinj, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, 
Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, Steinsplitter, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to