Yup, still true. We do at least have a common goal of structured HTML,
as defined by http://schema.org/CreativeWork
It sounds like Tpt's scraper will do wonders, if a Wikisource just
complies to that. I think that's one of the next steps we need to take.
I sort of figure from the English Wikisource point of view that we
should do more on bringing data *in* from Wikidata, in our {{header}},
rather than working on making it easier to extract data *out* with 
microformats/structured-
HTML. Well, we should do both, of course! :-) But my feeling from the
process of getting Author data in from Wikidata is that the whole
Wikidata integration becomes so much more worthwhile and clearer (and we
sort out the various edge cases) when we're actively using it for real.
But of course, each Wikisource is in a similar position. :-( And are we
to all be developing the Lua scripts and templates in isolation? Indeed
no! :-) We shall put them all toegther in our brave new Wikisource
extension! :)
—sam



On Wed, 1 Nov 2017, at 04:03 PM, Andrea Zanni wrote:
> @Sam, Tpt, 
> my personal experience is too that HTML is the way to pull out the
> Wikisource important metadata,> but it's also that every Wikisource has sort 
> of a different way to
> show them,> meaning that you need to tweak your scraper for each Wikisource. 
> Is that still true? Last time I did it was more than one year ago, but
> I need to try it again soon.> Aubrey
> 
> On Wed, Nov 1, 2017 at 1:00 AM, Sam Wilson
> <[email protected]> wrote:>> Yes I think you're definitely right! The 
> easier way to send
>> Wikisource>>  data to Wikidata is going to be a clever gadget that reads the
>>  microformat or schema'd info in each page. My hack was just a
>>  quick and>>  easy test at getting some things added. :)
>> 
>>  Ultimately, I'm actually not that excited about working on the tools>>  
>> that we need to transfer the data. No no I don't mean that!
>>  Well, just>>  that the end point we're aiming at is that a bunch of info 
>> *won't
>>  be* at>>  all in Wikisource, but will be pulled from Wikidata, and so I
>>  am much>>  more interested in making better tools for working with the data 
>> in>>  Wikidata. :-) If you see what I mean.
>> 
>>  My idea with ws-search is that it will progressively pull more
>>  and more>>  data from Wikidata, and only resort to HTML scraping where the
>>  data is>>  missing from Wikidata. I'm attempting to encapsulate this logic
>>  in the>>  `wikisource/api` PHP library.
>> 
>> 
>> 
>> On Tue, 31 Oct 2017, at 11:14 PM, Thomas Pellissier Tanon wrote:
>>  > Hello Sam,
>>  >
>>  > Thank you for this nice feature!
>>  >
>>  > I have created a few months ago a prototype of Wikisource to
>>  > Wikidata>>  > importation tool for the French Wikisource based on the 
>> schema.org>>  > annotation I have added to the main header template (I 
>> definitely
>>  > think>>  > we should move from our custom microformat to this schema.org
>>  > markup that>>  > could be much more structured). It's not yet ready but I 
>> plan to
>>  > move it>>  > forward in the coming weeks. A beginning of frontend to add 
>> to
>>  > your>>  > Wikidata common.js is here:
>>  > https://www.wikidata.org/wiki/User:Tpt/ws2wd.js
>>  > We should probably find a way to merge the two projects.
>>  >
>>  > Cheers,
>>  >
>>  > Thomas
>>  >
>>  > > Le 31 oct. 2017 à 15:10, Nicolas VIGNERON
>>  > > <[email protected]> a écrit :>>  > >
>>  > > 2017-10-31 13:16 GMT+01:00 Jane Darnell <[email protected]>:
>>  > > Sorry, I am much more of a Wikidatan than a Wikisourcerer! I was
>>  > > referring to items like this one>>  > > 
>> https://www.wikidata.org/wiki/Q21125368
>>  > >
>>  > > No need to be sorry, that is actually a good question and this
>>  > > example is even better (I totally forgot this kind of case).>>  > >
>>  > > For now, this is probably better to deal with it by hands (and
>>  > > I'm not sure what this tools can even do for this).>>  > >
>>  > > Cdlt, ~nicolas
>>  > > _______________________________________________
>>  > > Wikisource-l mailing list
>>  > > [email protected]
>>  > > https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>>  >
>>  > _______________________________________________
>>  > Wikisource-l mailing list
>>  > [email protected]
>>  > https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>> > Email had 1 attachment:
>>  > + signature.asc
>>  >   1k (application/pgp-signature)
>> 
>> 
>> _______________________________________________
>>  Wikisource-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
> _________________________________________________
> Wikisource-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l

_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Reply via email to