Re: [Wikidata-l] Multilinguistics matters
Hey :) On Sat, Mar 31, 2012 at 2:18 PM, JFC Morfin jef...@jefsey.com wrote: Lydia, I have a question. I am interested in multilinguistics. I define multilinguistics as the cybernetics of the mecalanguages, i.e. the operational, computer applications and strategic pragmatic coexistence of natural (and artificial) languages that can be used between man/man/machine/machine. I am, therefore, interested in stabilizing a table of all the (meca)languages names and specifics, and I have been working on the preparation of a WikiLinguae project for years in trying to converge/cooperate the teams of ISO, private, and Open Source tables, within the loose framework of the MAAYA network (Members: http://maaya.org/spip.php?article40, including UNESCO, ITU, ACALAN, LinguaMon, Union Latine, Francophonie, AUF, etc.) as well as to work on script homography, which is a DNS problem, etc. This problem is extended with the support of IDNA2008 consistency (IDNs), e-mail addresses, variants (i.e. the same term with different printings, or different Unicode Points). I would like to know if this area is part of the Wikidata project, or of the way it is planned to address it), polynymy issues (i.e. strict cross/languages synonyms), variances (i.e. the variation of the semantics of a term, or the appearance of a new term) both in data definitions and in data values. Hmmm I have to confess I don't understand this completely and therefor can't give you an answer. Could you give me an example of what you are talking about and how you see Wikidata fit in? I would also like to know, from the very beginning, if this WikiLinguae project was to be kept separate, should/could ally with Wikidata, or if Wikidata had its own project regarding multilinguistics (*). I note here that it should be both multilinguistic (to document every language) and polylingual (to document them in every language). In addition, should there be a Wikimedia extension/version of the locale files that would be documented cooperatively (in cooperation or not with other locale directories Thank you. jfc (*) Multilinguistics by nature is a semiotic discipline with semantic, syntax, and pragmatics. It treats every language as equal to the others. It should not be confused with globalization (eg. Unicode), which is: * the internationalization of the medium (support of ISO 10646) within an English framework + the localization of the ends (ISO 15897) + the filtering of linguistic quoted exchanges as per their language (ISO 639), scripts (15924), and administrative authority as a cultural referent (ISO 3166:1). * and results in langtags (RFC 5646). Globalization and an open langtag compatible format should be sufficient at this stage (due to the limited number of Wikipedia languages) if there is no specific need for Wikimedia formats or locale file extensions. Also, the WikiLinguae project seeks to be a wiki-based ISO 11179 conformant reference spine in the matter of languages and cultures: this is not a small task and may still call for time. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Wikidata Wikimedia Deutschland e.V. Eisenacher Straße 2 10777 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] Conservadata
Since some claim that Wikidata's model of representing statements with their references and the diversity of knowledge in the world outside is just a way to bring Wikipedia's values to the world of data, Conservapedia has launched an alternative approach to the challenges we are tackling. In good open source manner we hope that we can share code, and we wish Conservadata all the best. http://blog.tommorris.org/post/20277406012/conservadata Cheers, Denny -- Project director Wikidata Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Notability in Wikidata
In general, policies for notability in Wikidata will be governed by the community of (all) Wikidata editors. On the technical side, we aim to achieve two things: * The system should be able to handle a lot of data. * The interfaces and data access features should minimize the negative impact that additional (correct but not very important) data has on usage. Of course, both goals have their limits and there will always be good (technical or social) reasons to not include everything. We would rather like to support linking and data integration with external data bases than suggest *every* fact of the world to be copied to Wikidata. Markus On 31/03/12 20:22, emijrp wrote: Hi all; I'm thinking about notability in Wikidata and how it may conflict with Wikipedia current policies and community conceptions. Will Wikidata allow to create entities for small villages, asteroids, galaxies, stars, species, etc, that are not allowed today at Wikipedia? Including about those that don't have article in any Wikipedia? I will be happy if so. Regards, emijrp ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l -- Dr. Markus Kroetzsch Department of Computer Science, University of Oxford Room 306, Parks Road, OX1 3QD Oxford, United Kingdom +44 (0)1865 283529 http://korrekt.org/ ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Multilinguistics matters
Dear Lydia, Hmmm I have to confess I don't understand this completely and therefor can't give you an answer. Could you give me an example of what you are talking about and how you see Wikidata fit in? Hmmm :-) I am somewhat at loss here. Let start from some basic, then. 1) is there a concise consensual definition of what Wikidata is expected to achieve? 2) Has this Wikidata project some Charter or terms of reference (TOR), or dedicated web site, you could give the URL? http://wikidata.org is redirected to http://en.wikipedia.org/wiki/Main_Page 3) Is there a list of the needs of wikimedia projects that wikidata is to address ? 4) What is/are going to be the language(s) of Wikidata? 5) What is the Wikidata's architectural framework? WDE (whole digital ecosystem)? Human communications? The Internet services? Users applications? Wikimedia? 6) What are the normative choices, strategic objectives and constraints imposed by the sponsors and the WMF? 7) How the Wikidata project is to liaise with its potential users' representatives (depending on the response to (5)) Thank you ! jfc ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Data model (RDF)
On Sun, 1 Apr 2012, Markus Krötzsch wrote: A very interesting discussion. Some general answers to this are: * Wikidata does, of course, not intend to implement complex reasoning (or any other algorithm that qualifies as complex). * If useful for serving its requirements, Wikidata will not exclude modelling features just because they are also supported in OWL ;-) For example, it could be useful to say that Wikidata item describes the same as an external resource, which can be done in OWL using sameAs. Many communities could use this for integrating Wikidata information with other Web databases. * The reasoning support in Wikidata will not in general limit the modelling support in Wikidata: it might be possible to say something that has a formal meaning in OWL, even if this formal meaning is not relevant for query answering in Wikidata (sameAs with external resources is a possible example, since Wikidata would surely not pull data from these sources for internal query answering). * Wikidata will support various export formats, which have more or less native support for certain modelling features. We will use whatever expressivity is available in the given format to describe the Wikidata information as accurately as possible. This might again lead to some OWL constructs being used in RDF/OWL exports. All Wikidata content will have a formal meaning, and we will draw from existing experience and standards for defining this so that it is as widely compatible as possible. In summary, it is not about endorsing or rejecting a particular ontology language. We will be open and inclusive with what we support, and user requirements will be the main guideline for defining what can be said in the system. This sounds good to me. (Not in the least because this would allow data representations that are optimized for certain classes of knowledge, e.g., Topic Maps, or mathematical/physical relationships.) But it triggers the obvious question: when and how will such discussions (and decision making) be done in the course of the coming year? Best regards, Markus Best regards, Herman Bruyninckx On 01/04/12 08:54, Ivan Herman wrote: On Mar 31, 2012, at 11:17 , Jakob Voss wrote: JFC Morfin wrote: 2. Since we have a W3C expert: what is the best document/book to get a comprehensive and clear (not too massive) documentation on the semantic web? You surely don't want to know all about semantic web - especially the Ontology stuff with OWL dialects and entailment regimes is far too academic and won't be part of wikidata because of computational complexity anyway. In short, you should be *very sceptical* and cautious every time you stumple upon anything that requires inference rules. Even trivial inference rules such as those based on owl:sameAs and rdf:type can be problematic in practice! The less inference you assume, the better. Let us avoid the all-to-simplistic view that says Semantic Web == OWL:-) Indeed, bringing in (OWL) inferencing into the core WD project would be a mistake. From the SW stack, RDF, RDFS, and, on a different note, SPARQL and maybe RDB2RDF should be the technologies having a role in the project, as well and Linked Data patterns in general. That being said, it is probably good to have the vocabularies being used in WD be properly defined/described. If *somebody else* wants to do inferencing, for example, we should not stand in the way. Ivan I can recommend the Linked Data Patterns book by Dodds and Davis: http://patterns.dataincubator.org/book/ Indeed. That is a great one, too Ivan Jakob___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Archiving references for facts?
Maybe this GSoC project (from 2011) will be relevant: http://www.mediawiki.org/wiki/User:Kevin_Brown/ArchiveLinks/Design Best regards, Helder On Sun, Apr 1, 2012 at 06:31, emijrp emi...@gmail.com wrote: Hi all; I have read that every fact for every entity must include a reference. How is Wikidata going to deal with dead links? I hope we can work on this developing an archivist bot, to archive links into WebCitation or using Internet Archive. This is an old problem in all Wikipedias, and it is correctly addressed (the only example I know is French Wikipedia using Wikiwix.com to archive references and external links). Regards, emijrp ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Data_model: Metamodel: Wikipedialink
On 1 April 2012 13:04, Markus Krötzsch markus.kroetz...@cs.ox.ac.uk wrote: This is a valid point. It is intended to address this as follows: * Wikidata items (our content pages) will be in *exact* correspondence to (zero or more) Wikipedia articles in different languages. * Differences in scope will lead to different Wikidata items. * Relationships such as broader or narrower can be expressed as relations between these items, if desired. This is a technically valid solution. Socially, I fear it would lead to endless uncertainty which mechanism to use. Few abstract entities will have exactly the same delimitation/width, but where should one switch from one method of linking (one wikidata page with several more less closely matching wikipedia pages) to the other (several wikidata pages, one for each wikipedia page in each language)? Also, importing data will be a nightmare, because the concepts used in imported data will have to be compared with all wikipedias. One Wikipedia-language-version has the post-WWII extent of Russia as well as the current and another Wikipedia-language-version has them separated. It may not have mattered before and only one Wikidata page links to both language-versions. However at some point historical data are imported and suddently Wikidata needs to be reorganized to have two pages. ... Just thinking loud - this may be unavoidable perhaps... However, my gut feeling is that if you plan to avoid relations between Wikidata and Wikipedia, it might be a more comprehensible model to then always using only one method, i.e. have a 0 to 1 or 1 to 1 relation between Wikidata page and Wikipedia page only, and express everything else in Wikidata to Wikidata page relations. These relations are then easily traceable and updateable, just as the broadness or narrowness of a page in a given Wikipedia develops over time. In general, Wikidata will not be able to replace all interwiki links: it will remain possible to define additional links in each Wikipedia to cover cases where the relationship between articles is not exact. This worries me. It means that there will be forever conflicting systems of editing interwiki links. If everything can be achieved with Wikipedia, but only a subset with Wikidata, it spells social adoption danger. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Data_model: Metamodel: Wikipedialink
Scope is also called domain by some language folks. Basically two entries can be textually identical but still describe completly different topics. For example web as in fabric and in networking. In Wikipedia similar concepts often gets a common article, and often without explicitly stating the differences. Sometimes differences goes unnoticed because of cultural differences. Those can be very difficult to solve. Jeblad On 1. apr. 2012 21.25, Gregor Hagedorn g.m.haged...@gmail.com wrote: On 1 April 2012 13:04, Markus Krötzsch markus.kroetz...@cs.ox.ac.uk wrote: This is a valid point. It is intended to address this as follows: * Wikidata items (our content pages) will be in *exact* correspondence to (zero or more) Wikipedia articles in different languages. * Differences in scope will lead to different Wikidata items. * Relationships such as broader or narrower can be expressed as relations between these items, if desired. This is a technically valid solution. Socially, I fear it would lead to endless uncertainty which mechanism to use. Few abstract entities will have exactly the same delimitation/width, but where should one switch from one method of linking (one wikidata page with several more less closely matching wikipedia pages) to the other (several wikidata pages, one for each wikipedia page in each language)? Also, importing data will be a nightmare, because the concepts used in imported data will have to be compared with all wikipedias. One Wikipedia-language-version has the post-WWII extent of Russia as well as the current and another Wikipedia-language-version has them separated. It may not have mattered before and only one Wikidata page links to both language-versions. However at some point historical data are imported and suddently Wikidata needs to be reorganized to have two pages. ... Just thinking loud - this may be unavoidable perhaps... However, my gut feeling is that if you plan to avoid relations between Wikidata and Wikipedia, it might be a more comprehensible model to then always using only one method, i.e. have a 0 to 1 or 1 to 1 relation between Wikidata page and Wikipedia page only, and express everything else in Wikidata to Wikidata page relations. These relations are then easily traceable and updateable, just as the broadness or narrowness of a page in a given Wikipedia develops over time. In general, Wikidata will not be able to replace all interwiki links: it will remain possible to define additional links in each Wikipedia to cover cases where the relationship between articles is not exact. This worries me. It means that there will be forever conflicting systems of editing interwiki links. If everything can be achieved with Wikipedia, but only a subset with Wikidata, it spells social adoption danger. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Archiving references for facts?
Dear all I don't think this is a difficult problem, two points should be clarified first: 1. Al most all facts on Wikipedia need not be sourced. 2. Since sourcing cannot inform us of truth/falsehood of a fact - it, at best, indicates the authority of its source. I am not a lawyer, only a Information Specialist. So this following should be double checked with legal: Fact: Google, Caches practically anything it indexes: However attempts by some site owners to claim this cache as a violation of their copyright have been consistently repealed/dismissed by US law courts. I am not even sure what legal grounds we used, but caching is considered part of web tech and e.g. the browser caches are also not considered copyright violations. On the drawing board of the Next Generation Search Engine is content analytics capability for doing authority assessment of references using: 1. A Transitive Bibliometric authority model. 2. A metric of reference longevity. (Fad vs. Fact test) 3. Bootstrapping using content analysis where access to full text of source was made available. While the above model is complex, it would take (me) about 2 weeks of work to set up prototype reference repository -- a Nutch (crawler) + Solr + Storage (say MySql/Hbase/Cassandra) combination to index: external links, references including urls, references with no urls. This data would be immediately consumable via http using standards based request. (A SOLR feature). To add Integration with existing Search UI would probably take another 2 weeks. As would adding support for caching/indexing most significant non html document formats. However It would not be able to access content behind pay walls without access to a password. If and only if WMF sanctions this strategy - I could also draft a 'win win' policy for Encouraging such Hidden Web resource owners to provide free access to such a crawler. And possibly even open up their pay walls to our editors . e.g. remove No Follow directive from links to high WP:RS partners. I hope this help. Oren Bochman. MediaWiki Search Developer. From: wikidata-l-boun...@lists.wikimedia.org [mailto:wikidata-l-boun...@lists.wikimedia.org] On Behalf Of John Erling Blad Sent: Sunday, April 01, 2012 10:01 PM To: Discussion list for the Wikidata project. Subject: Re: [Wikidata-l] Archiving references for facts? Archiving a page should be pretty safe as long as the archived copy is only for internal use, that means something like OTRS. If the archived copy is republished it _might_ be viewed as a copyright infringement. Still note that the archived copy can be used for automatic verification, ie extract a quote and check that against a stored value, without infringing any copyright. If a publication is withdrawn it might be an indication that something is seriously wrong with the page, and no matter what the archived copy at WebCitation says the page can't be trusted. Its really a very difficult problem. Jeblad On 1. apr. 2012 14.08, Helder helder.w...@gmail.com wrote: ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l