jayvdb created this task.
jayvdb added a subscriber: jayvdb.
jayvdb added a project: Pywikibot-Wikidata.
jayvdb changed Security from none to none.

TASK DESCRIPTION
  Some data in Wikipedia is easier to extract from the rendered html than from 
the templates, and it puts the values into microformats.  There may also be 
other webpages which use microformats which could be used to extract 
information and add it to wikidata.  I expect this should be done in a new 
script, but it would be based on script harvest_templates.py
  
  https://en.wikipedia.org/wiki/Help:Microformats .
  
  birthdate and deathdate are good examples, where on English Wikipedia they 
are placed in special spans, using a constant format.  
  
  view-source:https://en.wikipedia.org/wiki/Benjamin_Franklin
  
  <span class="bday">1706-01-17</span>
  <span class="dday deathdate">1790-04-17</span>
  
  The {{Persondata}} template is relatively easy to parse the template, but it 
is also well labelled in the HTML. 
<https://en.wikipedia.org/wiki/Wikipedia:Persondata>
  
  <table id="persondata" class="persondata noprint" style="border:1px solid 
#aaa; display:none; speak:none;">
  <tr>
  <th colspan="2"><a href="/wiki/Wikipedia:Persondata" 
title="Wikipedia:Persondata">Persondata</a></th>
  </tr>
  <tr>
  <td class="persondata-label" style="color:#aaa;">Name</td>
  <td>Franklin, Benjamin</td>
  </tr>
  <tr>
  <td class="persondata-label" style="color:#aaa;">Alternative names</td>
  <td></td>
  </tr>
  <tr>
  <td class="persondata-label" style="color:#aaa;">Short description</td>
  <td>American printer, writer, politician</td>
  </tr>
  <tr>
  <td class="persondata-label" style="color:#aaa;">Date of birth</td>
  <td>January 17, 1706</td>
  </tr>
  <tr>
  <td class="persondata-label" style="color:#aaa;">Place of birth</td>
  <td>Boston, Massachusetts</td>
  </tr>
  <tr>
  <td class="persondata-label" style="color:#aaa;">Date of death</td>
  <td>April 17, 1790</td>
  </tr>
  <tr>
  <td class="persondata-label" style="color:#aaa;">Place of death</td>
  <td><a href="/wiki/Philadelphia" title="Philadelphia">Philadelphia</a>, 
Pennsylvania</td>
  </tr>
  </table>
  
  More at https://en.wikipedia.org/wiki/Wikipedia:Metadata
  
  A list of templates which generate microformats is at 
https://en.wikipedia.org/wiki/Category:Templates_generating_microformats , and 
sample pages can be found by using 'whatlinkshere'.
  
  e.g. vcard with fn org can be seen in the source of the infobox here:
  
  view-source:https://en.wikipedia.org/wiki/Manchester_Ship_Canal

TASK DETAIL
  https://phabricator.wikimedia.org/T78416

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jayvdb
Cc: Aklapper, jayvdb, pywikipedia-bugs



_______________________________________________
Pywikipedia-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-bugs

Reply via email to