Re: [Wikitech-l] API attribute ID for querying wikipedia pages
On 23-Apr-14 21:29, wikitech-l-requ...@lists.wikimedia.org wrote: Re: API attribute ID for querying wikipedia pages @Matma Rex: This is way to general, I think it would be a lot better when this would be in more detail. For example when I want to fetch a table with all currencies on https://en.wikipedia.org/wiki/List_of_circulating_currencies, I would make an API call like this:https://en.wikipedia.org/w/api.php?action=parsepage=List%20of%20circulating%20currenciesprop=sectionsformat=jsonfm. This returns 5 sections with numbers which I can use as reference points, but I would rather have a number for the table in the section. A section can have multiple tables. Querying specific (structured) data from Wikipedia is still very difficult in my opinion. My suggestion is that every paragraph, image, link and table get a unique identifiable number. This way Wikipedia gets more machine readable. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] API attribute ID for querying wikipedia pages
On Thu, Apr 24, 2014 at 2:24 PM, Daan Kuijsten daankuijs...@gmail.comwrote: On 23-Apr-14 21:29, wikitech-l-requ...@lists.wikimedia.org wrote: Re: API attribute ID for querying wikipedia pages @Matma Rex: This is way to general, I think it would be a lot better when this would be in more detail. For example when I want to fetch a table with all currencies on https://en.wikipedia.org/wiki/ List_of_circulating_currencies, I would make an API call like this: https://en.wikipedia.org/w/api.php?action=parsepage= List%20of%20circulating%20currenciesprop=sectionsformat=jsonfm. This returns 5 sections with numbers which I can use as reference points, but I would rather have a number for the table in the section. A section can have multiple tables. Querying specific (structured) data from Wikipedia is still very difficult in my opinion. My suggestion is that every paragraph, image, link and table get a unique identifiable number. This way Wikipedia gets more machine readable. I see where you are coming from, but this implies that these are stable properties over multiple revisions, which they aren't. If I have a table in revision 1, remove it in revision 2, and add it back in in revision 3, is it still the same table? What if I slightly change it? How much do I have to change it before its identity changes? A wiki(pedia) page is by its very nature a dynamic construct, and assigning stable identifiers to elements would make this at least extremely impractical. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] API attribute ID for querying wikipedia pages
On Thu, 24 Apr 2014 14:24:08 +0200, Daan Kuijsten daankuijs...@gmail.com wrote: Querying specific (structured) data from Wikipedia is still very difficult in my opinion. My suggestion is that every paragraph, image, link and table get a unique identifiable number. This way Wikipedia gets more machine readable. You want Semantic MediaWiki[1] then (which the Wikipedias don't use) or Wikidata[2], which is one of Wikipedia's sister projects and has been growing very fast. Wikipedia was never intended to be machine-readable in the way you propose (although it does provide access to MediaWiki's awesome API). [1] https://www.mediawiki.org/wiki/Extension:Semantic_MediaWiki [2] https://www.wikidata.org/ -- Matma Rex ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] API attribute ID for querying wikipedia pages
Hoi, I totally agree that you should be able to do this. However, would it not make more sense to get structured information from Wikidata? Thanks, GerardM On 24 April 2014 14:24, Daan Kuijsten daankuijs...@gmail.com wrote: On 23-Apr-14 21:29, wikitech-l-requ...@lists.wikimedia.org wrote: Re: API attribute ID for querying wikipedia pages @Matma Rex: This is way to general, I think it would be a lot better when this would be in more detail. For example when I want to fetch a table with all currencies on https://en.wikipedia.org/wiki/ List_of_circulating_currencies, I would make an API call like this: https://en.wikipedia.org/w/api.php?action=parsepage= List%20of%20circulating%20currenciesprop=sectionsformat=jsonfm. This returns 5 sections with numbers which I can use as reference points, but I would rather have a number for the table in the section. A section can have multiple tables. Querying specific (structured) data from Wikipedia is still very difficult in my opinion. My suggestion is that every paragraph, image, link and table get a unique identifiable number. This way Wikipedia gets more machine readable. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] API attribute ID for querying wikipedia pages
On 04/24/2014 05:24 AM, Daan Kuijsten wrote: On 23-Apr-14 21:29, wikitech-l-requ...@lists.wikimedia.org wrote: Re: API attribute ID for querying wikipedia pages @Matma Rex: This is way to general, I think it would be a lot better when this would be in more detail. For example when I want to fetch a table with all currencies on https://en.wikipedia.org/wiki/List_of_circulating_currencies, I would make an API call like this:https://en.wikipedia.org/w/api.php?action=parsepage=List%20of%20circulating%20currenciesprop=sectionsformat=jsonfm. This returns 5 sections with numbers which I can use as reference points, but I would rather have a number for the table in the section. A section can have multiple tables. Querying specific (structured) data from Wikipedia is still very difficult in my opinion. My suggestion is that every paragraph, image, link and table get a unique identifiable number. This way Wikipedia gets more machine readable. We (the Parsoid team) are actually working on this, see https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec/Element_IDs Besides making it possible to reference content, our goal is to use these ids as a key that lets us associate additional metadata with each element in the DOM. We expect stable element ids to be available in Parsoid output by this summer. Gabriel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] API attribute ID for querying wikipedia pages
Currently we are experiencing problems when we try to query wikipedia. Fetching content via the Wikipedia API can be a lot easier in our opinion. The problem we have is that it is possible to fetch content via the property rvsection, which will accept a value (number) which represents the section number starting from the top section to the bottom section. This is a very dangerous way of fetching content. When there is another section inserted on top of the page, all section numbers will be moved 1 up. A better way for fetching content via an API is to assign a unique ID to a section, a paragraph, a table, an image etc. This way we could simply fetch a part of the content of wikipedia via this ID. I would like to know if my problem is shared with other developers inside the Wikipedia API team. Kind regards, Daan Kuijsten ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] API attribute ID for querying wikipedia pages
On Wed, Apr 23, 2014 at 3:48 AM, Daan Kuijsten daankuijs...@gmail.comwrote: A better way for fetching content via an API is to assign a unique ID to a section, a paragraph, a table, an image etc. This way we could simply fetch a part of the content of wikipedia via this ID. That doesn't sound much better. Say a vandal blanks a page then someone reverts, and probably all your unique ID numbers will have changed. Or someone renames a section or edits a paragraph, or combines two sections, or splits a section into two, etc. -- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] API attribute ID for querying wikipedia pages
On Wed, 23 Apr 2014 09:48:17 +0200, Daan Kuijsten daankuijs...@gmail.com wrote: A better way for fetching content via an API is to assign a unique ID to a section, a paragraph, a table, an image etc. This way we could simply fetch a part of the content of wikipedia via this ID. Such ids already exist, and they are present in the page HTML as 'id' attributes on the headings. They are constructed simply based on heading text, with unique identifiers appended if duplicates happen. You can access these via the API too, using action=parseprop=sections [1] (the 'anchor' property), then map them to the numerical identifiers other API modules use (the 'number' property). [1] https://en.wikipedia.org/w/api.php?action=parsepage=Main%20Pageprop=sectionsformat=jsonfm -- Matma Rex ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l