Re: [Wikidata-l] Multilinguistics matters

2012-04-01 Thread Lydia Pintscher
Hey :)

On Sat, Mar 31, 2012 at 2:18 PM, JFC Morfin jef...@jefsey.com wrote:

 Lydia,

 I have a question. I am interested in multilinguistics. I define
 multilinguistics as the cybernetics of the mecalanguages, i.e. the
 operational, computer applications and strategic pragmatic coexistence of
 natural (and artificial) languages that can be used between
 man/man/machine/machine. I am, therefore, interested in stabilizing a table
 of all the (meca)languages names and specifics, and I have been working on
 the preparation of a WikiLinguae project for years in trying to
 converge/cooperate the teams of ISO, private, and Open Source tables, within
 the loose framework of the MAAYA network (Members:
 http://maaya.org/spip.php?article40, including UNESCO, ITU, ACALAN,
 LinguaMon, Union Latine, Francophonie, AUF, etc.) as well as to work on
 script homography, which is a DNS problem, etc.

 This problem is extended with the support of IDNA2008 consistency (IDNs),
 e-mail addresses, variants (i.e. the same term with different printings, or
 different Unicode Points). I would like to know if this area is part of the
 Wikidata project, or of the way it is planned to address it), polynymy
 issues (i.e. strict cross/languages synonyms), variances (i.e. the variation
 of the semantics of a term, or the appearance of a new term) both in data
 definitions and in data values.

Hmmm I have to confess I don't understand this completely and therefor
can't give you an answer. Could you give me an example of what you are
talking about and how you see Wikidata fit in?

 I would also like to know, from the very beginning, if this WikiLinguae
 project was to be kept separate, should/could ally with Wikidata, or if
 Wikidata had its own project regarding multilinguistics (*). I note here
 that it should be both multilinguistic (to document every language) and
 polylingual (to document them in every language). In addition, should there
 be a Wikimedia extension/version of the locale files that would be
 documented cooperatively (in cooperation or not with other locale
 directories

 Thank you.
 jfc

 (*) Multilinguistics by nature is a semiotic discipline with semantic,
 syntax, and pragmatics. It treats every language as equal to the others. It
 should not be confused with globalization (eg. Unicode), which is:

 * the internationalization of the medium (support of ISO 10646) within an
 English framework
 + the localization of the ends (ISO 15897)
 + the filtering of linguistic quoted exchanges as per their language (ISO
 639), scripts (15924), and administrative authority as a cultural referent
 (ISO 3166:1).
 * and results in langtags (RFC 5646).

 Globalization and an open langtag compatible format should be sufficient at
 this stage (due to the limited number of Wikipedia languages) if there is no
 specific need for Wikimedia formats or locale file extensions. Also, the
 WikiLinguae project seeks to be a wiki-based ISO 11179 conformant reference
 spine in the matter of languages and cultures: this is not a small task
 and may still call for time.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Wikidata

Wikimedia Deutschland e.V.
Eisenacher Straße 2
10777 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Conservadata

2012-04-01 Thread Denny Vrandečić
Since some claim that Wikidata's model of representing statements with
their references and the diversity of knowledge in the world outside is
just a way to bring Wikipedia's values to the world of data, Conservapedia
has launched an alternative approach to the challenges we are tackling. In
good open source manner we hope that we can share code, and we wish
Conservadata all the best.

http://blog.tommorris.org/post/20277406012/conservadata

Cheers,
Denny

-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Notability in Wikidata

2012-04-01 Thread Markus Krötzsch
In general, policies for notability in Wikidata will be governed by the 
community of (all) Wikidata editors. On the technical side, we aim to 
achieve two things:


* The system should be able to handle a lot of data.

* The interfaces and data access features should minimize the negative 
impact that additional (correct but not very important) data has on usage.


Of course, both goals have their limits and there will always be good 
(technical or social) reasons to not include everything. We would rather 
like to support linking and data integration with external data bases 
than suggest *every* fact of the world to be copied to Wikidata.


Markus


On 31/03/12 20:22, emijrp wrote:

Hi all;

I'm thinking about notability in Wikidata and how it may conflict with
Wikipedia current policies and community conceptions. Will Wikidata
allow to create entities for small villages, asteroids, galaxies, stars,
species, etc, that are not allowed today at Wikipedia? Including about
those that don't have article in any Wikipedia?

I will be happy if so.

Regards,
emijrp



___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l



--
Dr. Markus Kroetzsch
Department of Computer Science, University of Oxford
Room 306, Parks Road, OX1 3QD Oxford, United Kingdom
+44 (0)1865 283529   http://korrekt.org/

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Multilinguistics matters

2012-04-01 Thread JFC Morfin

Dear Lydia,

Hmmm I have to confess I don't understand this completely and therefor
can't give you an answer. Could you give me an example of what you are
talking about and how you see Wikidata fit in?


Hmmm :-) I am somewhat at loss here. Let start from some basic, then.

1) is there a concise consensual definition of what Wikidata is
expected to achieve?

2) Has this Wikidata project some Charter or terms of reference (TOR),
or dedicated web site, you could give the URL? http://wikidata.org is
redirected to http://en.wikipedia.org/wiki/Main_Page

3) Is there a list of the needs of wikimedia projects that wikidata is
to address ?

4) What is/are going to be the language(s) of Wikidata?

5) What is the Wikidata's architectural framework? WDE (whole digital
ecosystem)? Human communications? The Internet services? Users
applications? Wikimedia?

6) What are the normative choices, strategic objectives and
constraints imposed by the sponsors and the WMF?

7) How the Wikidata project is to liaise with its potential users'
representatives (depending on the response to (5))

Thank you !
jfc


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data model (RDF)

2012-04-01 Thread Herman Bruyninckx

On Sun, 1 Apr 2012, Markus Krötzsch wrote:


A very interesting discussion. Some general answers to this are:

* Wikidata does, of course, not intend to implement complex reasoning (or any 
other algorithm that qualifies as complex).


* If useful for serving its requirements, Wikidata will not exclude modelling 
features just because they are also supported in OWL ;-) For example, it 
could be useful to say that Wikidata item describes the same as an external 
resource, which can be done in OWL using sameAs. Many communities could use 
this for integrating Wikidata information with other Web databases.


* The reasoning support in Wikidata will not in general limit the modelling 
support in Wikidata: it might be possible to say something that has a formal 
meaning in OWL, even if this formal meaning is not relevant for query 
answering in Wikidata (sameAs with external resources is a possible example, 
since Wikidata would surely not pull data from these sources for internal 
query answering).


* Wikidata will support various export formats, which have more or less 
native support for certain modelling features. We will use whatever 
expressivity is available in the given format to describe the Wikidata 
information as accurately as possible. This might again lead to some OWL 
constructs being used in RDF/OWL exports. All Wikidata content will have a 
formal meaning, and we will draw from existing experience and standards for 
defining this so that it is as widely compatible as possible.


In summary, it is not about endorsing or rejecting a particular ontology 
language. We will be open and inclusive with what we support, and user 
requirements will be the main guideline for defining what can be said in 
the system.


This sounds good to me. (Not in the least because this would allow data
representations that are optimized for certain classes of knowledge, e.g.,
Topic Maps, or mathematical/physical relationships.)

But it triggers the obvious question: when and how will such discussions
(and decision making) be done in the course of the coming year?


Best regards,

Markus


Best regards,

Herman Bruyninckx


On 01/04/12 08:54, Ivan Herman wrote:


On Mar 31, 2012, at 11:17 , Jakob Voss wrote:


JFC Morfin wrote:



2. Since we have a W3C expert: what is the best document/book to get
a comprehensive and clear (not too massive) documentation on the
semantic web?


You surely don't want to know all about semantic web - especially the
Ontology stuff with OWL dialects and entailment regimes is far too
academic and won't be part of wikidata because of computational
complexity anyway. In short, you should be *very sceptical* and
cautious every time you stumple upon anything that requires inference
rules. Even trivial inference rules such as those based on owl:sameAs
and rdf:type can be problematic in practice! The less inference you
assume, the better.



Let us avoid the all-to-simplistic view that says Semantic Web == OWL:-)

Indeed, bringing in (OWL) inferencing into the core WD project would be a 
mistake. From the SW stack, RDF, RDFS, and, on a different note, SPARQL and 
maybe RDB2RDF should be the technologies having a role in the project, as 
well and Linked Data patterns in general.


That being said, it is probably good to have the vocabularies being used in 
WD be properly defined/described. If *somebody else* wants to do 
inferencing, for example, we should not stand in the way.


Ivan



I can recommend the Linked Data Patterns book by Dodds and Davis:
http://patterns.dataincubator.org/book/


Indeed. That is a great one, too

Ivan


Jakob___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Archiving references for facts?

2012-04-01 Thread Helder
Maybe this GSoC project (from 2011) will be relevant:
http://www.mediawiki.org/wiki/User:Kevin_Brown/ArchiveLinks/Design

Best regards,
Helder

On Sun, Apr 1, 2012 at 06:31, emijrp emi...@gmail.com wrote:
 Hi all;

 I have read that every fact for every entity must include a reference. How
 is Wikidata going to deal with dead links? I hope we can work on this
 developing an archivist bot, to archive links into WebCitation or using
 Internet Archive. This is an old problem in all Wikipedias, and it is
 correctly addressed (the only example I know is French Wikipedia using
 Wikiwix.com to archive references and external links).

 Regards,
 emijrp

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data_model: Metamodel: Wikipedialink

2012-04-01 Thread Gregor Hagedorn
On 1 April 2012 13:04, Markus Krötzsch markus.kroetz...@cs.ox.ac.uk wrote:
 This is a valid point. It is intended to address this as follows:
 * Wikidata items (our content pages) will be in *exact* correspondence to
 (zero or more) Wikipedia articles in different languages.
 * Differences in scope will lead to different Wikidata items.
 * Relationships such as broader or narrower can be expressed as
 relations between these items, if desired.

This is a technically valid solution. Socially, I fear it would lead
to endless uncertainty which mechanism to use. Few abstract entities
will have exactly the same delimitation/width, but where should one
switch from one method of linking (one wikidata page with several more
less closely matching wikipedia pages) to the other (several wikidata
pages, one for each wikipedia page in each language)?

Also, importing data will be a nightmare, because the concepts used in
imported data will have to be compared with all wikipedias. One
Wikipedia-language-version has the post-WWII extent of Russia as well
as the current and another Wikipedia-language-version has them
separated. It may not have mattered before and only one Wikidata page
links to both language-versions. However at some point historical data
are imported and suddently Wikidata needs to be reorganized to have
two pages. ... Just thinking loud - this may be unavoidable perhaps...

However, my gut feeling is that if you plan to avoid relations between
Wikidata and Wikipedia, it might be a more comprehensible model to
then always using only one method, i.e. have a 0 to 1 or 1 to 1
relation between Wikidata page and Wikipedia page only, and express
everything else in Wikidata to Wikidata page relations. These
relations are then easily traceable and updateable, just as the
broadness or narrowness of a page in a given Wikipedia develops over
time.

 In general, Wikidata will not be able to replace all interwiki links: it
 will remain possible to define additional links in each Wikipedia to cover
 cases where the relationship between articles is not exact.

This worries me. It means that there will be forever conflicting
systems of editing interwiki links. If everything can be achieved with
Wikipedia, but only a subset with Wikidata, it spells social adoption
danger.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data_model: Metamodel: Wikipedialink

2012-04-01 Thread John Erling Blad
Scope is also called domain by some language folks. Basically two entries
can be textually identical but still describe completly different topics.
For example web as in fabric and in networking.

In Wikipedia similar concepts often gets a common article, and often
without explicitly stating the differences.

Sometimes differences goes unnoticed because of cultural differences. Those
can be very difficult to solve.

Jeblad
On 1. apr. 2012 21.25, Gregor Hagedorn g.m.haged...@gmail.com wrote:

 On 1 April 2012 13:04, Markus Krötzsch markus.kroetz...@cs.ox.ac.uk
 wrote:
  This is a valid point. It is intended to address this as follows:
  * Wikidata items (our content pages) will be in *exact* correspondence
 to
  (zero or more) Wikipedia articles in different languages.
  * Differences in scope will lead to different Wikidata items.
  * Relationships such as broader or narrower can be expressed as
  relations between these items, if desired.

 This is a technically valid solution. Socially, I fear it would lead
 to endless uncertainty which mechanism to use. Few abstract entities
 will have exactly the same delimitation/width, but where should one
 switch from one method of linking (one wikidata page with several more
 less closely matching wikipedia pages) to the other (several wikidata
 pages, one for each wikipedia page in each language)?

 Also, importing data will be a nightmare, because the concepts used in
 imported data will have to be compared with all wikipedias. One
 Wikipedia-language-version has the post-WWII extent of Russia as well
 as the current and another Wikipedia-language-version has them
 separated. It may not have mattered before and only one Wikidata page
 links to both language-versions. However at some point historical data
 are imported and suddently Wikidata needs to be reorganized to have
 two pages. ... Just thinking loud - this may be unavoidable perhaps...

 However, my gut feeling is that if you plan to avoid relations between
 Wikidata and Wikipedia, it might be a more comprehensible model to
 then always using only one method, i.e. have a 0 to 1 or 1 to 1
 relation between Wikidata page and Wikipedia page only, and express
 everything else in Wikidata to Wikidata page relations. These
 relations are then easily traceable and updateable, just as the
 broadness or narrowness of a page in a given Wikipedia develops over
 time.

  In general, Wikidata will not be able to replace all interwiki links: it
  will remain possible to define additional links in each Wikipedia to
 cover
  cases where the relationship between articles is not exact.

 This worries me. It means that there will be forever conflicting
 systems of editing interwiki links. If everything can be achieved with
 Wikipedia, but only a subset with Wikidata, it spells social adoption
 danger.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Archiving references for facts?

2012-04-01 Thread Oren Bochman
Dear all

 

I don't think this is a difficult problem, two points should be clarified
first:

1. Al most all facts on Wikipedia need not be sourced.

2. Since sourcing cannot inform us of truth/falsehood of a fact - it, at
best, indicates the authority of its source.

 

I am not a lawyer, only a Information  Specialist. So this following
should be double checked with legal:

 

Fact: Google, Caches practically anything it indexes:

However attempts by some site owners to claim this cache as a violation of
their copyright have been consistently repealed/dismissed by US law courts. 

I am not even sure what legal grounds we used, but caching is considered
part of web tech and e.g. the browser caches are also not considered
copyright 
violations.

 

On the drawing board of the Next Generation Search Engine is content
analytics capability for doing authority assessment of references using:

1.   A Transitive Bibliometric authority model.

2.   A metric of reference longevity. (Fad vs. Fact test)

3.   Bootstrapping using content analysis where access to full text of
source was made available.

 

While the above model is complex, it would take (me) about 2 weeks of work
to set  up prototype reference repository -- 

a Nutch (crawler) + Solr + Storage (say MySql/Hbase/Cassandra) combination
to index:

external links,  

references including urls, 

references with no urls. 

This data would be immediately consumable via http using standards based
request. (A SOLR feature). 

 

 

To add Integration with existing Search UI would probably  take another 2
weeks. As would adding support for caching/indexing most significant non
html document formats.

 

However It would not be able to access  content behind pay walls without
access to a password. If and only if WMF sanctions this strategy - I could
also draft a 'win win' policy for Encouraging such Hidden Web resource
owners to provide free access to such a crawler. And possibly even open up
their pay walls  to our editors . 

e.g. remove No Follow directive from links to high WP:RS partners.

 

I hope this help.

 

 

Oren Bochman.

 

MediaWiki Search Developer.

 

From: wikidata-l-boun...@lists.wikimedia.org
[mailto:wikidata-l-boun...@lists.wikimedia.org] On Behalf Of John Erling
Blad
Sent: Sunday, April 01, 2012 10:01 PM
To: Discussion list for the Wikidata project.
Subject: Re: [Wikidata-l] Archiving references for facts?

 

Archiving a page should be pretty safe as long as the archived copy is only
for internal use, that means something like OTRS. If the archived copy is
republished it _might_ be viewed as a copyright infringement.

Still note that the archived copy can be used for automatic verification, ie
extract a quote and check that against a stored value, without infringing
any copyright. If a publication is withdrawn it might be an indication that
something is seriously wrong with the page, and no matter what the archived
copy at WebCitation says the page can't be trusted.

Its really a very difficult problem.

Jeblad

On 1. apr. 2012 14.08, Helder helder.w...@gmail.com wrote:

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l