Re: [Wikisource-l] Drop OAI-PMH repository of Index: pages

Thomas PT Sat, 31 Dec 2016 04:10:15 -0800

Hello Andrea,

> I guess it could be very useful to use for importing those data into Wikidata.


Even when removing the OAI-PMH API we could still extract data from the Index: 
page serialization. It's a bit more difficult but not much more (and definitely 
far less than the entity matching problem).

>  The problem with those API is that it works only it Index pages, which are 
> only a fraction of the "book" entity on Wikisource. Index pages are not 
> linked in a structured way with their ns0 pages, and this is a problem for us.

It's possible to retrieve the ns0 pages that uses a given Index: page using the 
<pages> tag (you just have to retrieve the list of transclusions of the Index: 
page as if it where a regular template).

> Ideally, we would know when a Index page has only one ns0 page, and we would 
> use the same set of data to create an entity (or more) into Wikidata.

Yes. What we could do is see if the "Title" field of the index pages has only 
one link to a ns0 page and consider this is the "one" ns0 page. An other 
possible thing is, when the header feature of the <pages> tag is use, retrieve 
the pages that use the automatic summary feature and, if there is only one, 
consider this as the "one".

> and I don't know if that uses your API.

I believe he doesn't but we should definitely ask him if it's useful for his 
use case.

Thomas

> Le 31 déc. 2016 à 12:58, Andrea Zanni <[email protected]> a écrit :
> 
> Hi Thomas.
> 
> I used, one year ago, the API: I downloaded the data from the Index pages, 
> and I think that it would be good to have it while we still don't have 
> Wikidata.
> I guess it could be very useful to use for importing those data into Wikidata.
> 
> The problem with those API is that it works only it Index pages, which are 
> only a fraction of the "book" entity on Wikisource. Index pages are not 
> linked in a structured way with their ns0 pages, and this is a problem for us.
> 
> Ideally, we would know when a Index page has only one ns0 page, and we would 
> use the same set of data to create an entity (or more) into Wikidata.
> 
> I know that Sam is trying to develop a similar tool:
> https://tools.wmflabs.org/ws-search/
> and I don't know if that uses your API.
> 
> Aubrey
> 
> On Fri, Dec 30, 2016 at 6:15 PM, Thomas PT <[email protected]> wrote:
> I definitely used the pageviews API. So I understand now why the count was 0. 
> Sorry for the false info and thank you for your correction.
> 
> But my proposal still stands as I do not know any actual user of the API.
> 
> Thomas
> 
> > Le 30 déc. 2016 à 18:11, Federico Leva (Nemo) <[email protected]> a écrit :
> >
> > Sorry for the double message.
> >
> > Thomas PT, 30/12/2016 17:31:
> >> According to the Wikimedia PageView statistic tool
> >
> > Did you literally use https://tools.wmflabs.org/pageviews , or have you 
> > asked for real requests data? The pageviews API doesn't count requests to 
> > the OAI-PMH endpoint at all, because they have "content-type: text/xml" 
> > while text/html is required: 
> > https://meta.wikimedia.org/wiki/Research:Page_view#Definition
> >
> > Only people with access to 
> > https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest#wmf.webrequest
> >  can extract data on how much it's used.
> >
> > Nemo
> >
> > _______________________________________________
> > Wikisource-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wikisource-l
> 
> _______________________________________________
> Wikisource-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
> 
> _______________________________________________
> Wikisource-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l

signature.asc
Description: Message signed with OpenPGP

_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] Drop OAI-PMH repository of Index: pages

Reply via email to