[Wikidata] Re: History of some original Wikidata design decisions?

Goran Milovanovic Wed, 28 Jul 2021 10:27:06 -0700

Hi,

*Denny*, *Thad*:


A sketchy thought from sincerely yours Data Scientist,

>> I think that it is entirely possible to layer a prototype semantics over
Wikidata...

is quite doable, given that

- we first remove the "catalogues" from Wikidata, e.g. scholarly articles
and astronomical objects, so to
- enable for some decent level of semantic coherence to emerge there, and
finally
- introduce a property like "associated to" that has an semantic
relatedness (i.e. associative strength) score as its value (nevermind the
scale for now);
- essentially, the Wikidata Concepts Monitor
<https://wikidata-analytics.wmcloud.org/app/WikidataAnalytics> system
computes such values across a selection of large Wikidata classes, relying
on
- the re-use data for Wikidata entities across all WMF projects that
have a client-side
Wikidata entity usage tracking enabled
<https://www.mediawiki.org/wiki/Wikibase/Schema/wbc_entity_usage> (i.e.
using Wikipedia and other projects as its referential worlds).

In effect, we would end up having a Wikidata that provides a basis for both
(A) a "good old symbolic AI" (properties, ontology) approach (what we have
now), and (B) a basis for statistical learning based approaches (by
providing entity associations) - a quite desirable outcome I'd say.

Cheers,
GSM

Goran S. Milovanović, PhD
Data Scientist, Software Department
Wikimedia Deutschland

------------------------------------------------
"It's not the size of the dog in the fight,
it's the size of the fight in the dog."
- Mark Twain
------------------------------------------------


On Mon, Jul 26, 2021 at 9:00 PM David McDonell <[email protected]> wrote:

> Seconded!!
>
> On Mon, Jul 26, 2021 at 12:47 PM Samuel Klein <[email protected]> wrote:
>
>> Wow :)  Thanks for that, Dan!
>>
>> On Mon, Jul 26, 2021 at 11:43 AM Dan Brickley <[email protected]> wrote:
>>
>>>
>>>
>>> On Mon, 26 Jul 2021 at 11:58, Jan Dittrich <[email protected]>
>>> wrote:
>>>
>>>> I would be very interested in Wikidatas Relation to Cyc
>>>> <https://en.wikipedia.org/wiki/Cyc> on one hand and the semantic Web
>>>> on the other.
>>>>
>>>
>>> this isn’t written down in one place well, yet
>>>
>>> Here is one strand of history, emphasising from Cyc via Guha’s later
>>> work on MCF.
>>>
>>> CycL inspired Apple MCF, which got XMLified by Tim Bray when Guha took
>>> it Netscape. June ‘97 it was submitted to W3C by Netscape. It combined with
>>> requirements from W3C content labeling work (PICS), where there was
>>> interest in adding more decentralized expressivity (eg to support Dublin
>>> Core and other schemas being combined in one “label”), complex structures
>>> and datatyped property values, aka Signed PICS labels and PICS-NG. While
>>> PICS and PICS-NG had an s-expression based syntax, RDF (like the 1997
>>> iteration of MCF) went with XML. At the time XML was being invented by
>>> stripping SGML down into something that might suit the Web. Microsoft
>>> submitted XML-Data to W3C mid 97 too (as well as later a revision, breaking
>>> W3C etiquette). XML-Data shared some goals with RDF but not its graph data
>>> model. RDF and other usecases led to XML Namespaces being an important
>>> thing. As XML popularity grew, RDF was under pressure since it didn’t
>>> engage much with the SGML heritage. The RDFS WG launched just after the RDF
>>> Model + Syntax spec was announced at Dublin Core’s conference in Finland.
>>> This being the “browser wars” era both RDF and RDFS were under huge
>>> pressure to be completed quickly. RDFS included a small subset of the
>>> schema-defining machinery from MCF. The RDF M+S WG produced an RDF
>>> recommendation in Feb 1999 but RDFS was left in limbo, in part because the
>>> XML community were wary of being forced to build XML Schema on top of it.
>>> Meanwhile from 1998 a small but enthusiastic community started to build
>>> around RDF - experimenting with query languages, databases, integration
>>> with inference engines, APIs etc., alongside continued support from
>>> Netscape who used the technology heavily for everything from RSS feeds,
>>> sitemaps, “whats related” annotation services, open data (dmoz) dumps, to
>>> their own browser’s internal data source APIs (xul templates, bookmarks,
>>> mail, ..). On the standards track, W3C management backed off from RDF work
>>> to reflect the concerns of its membership, who tended to much prefer XML.
>>> Meanwhile the US military research agency DARPA had been persuaded by an
>>> academic turned staffer (Jim Hendler) who had worked on similar early
>>> technology (SHOE, PIQ) that they should fund research to standardize a
>>> DARPA Agent Markup Language. A DAML / W3C collaboration led to the
>>> RDF-oriented W3C team at MIT receiving DARPA funding to continue the work
>>> area that had not engaged the XML-centric interest of W3C’s membership (ie
>>> Advisory Committee). Alongside this, RDF/S had engaged the interests of
>>> European researchers working around logic-based KR languages, eg f-logic,
>>> description logics etc., resulting in DAML (US) and OIL (description logic
>>> EU research project outcomes) collaborating via adhoc transatlantic
>>> committee to produce DAML+OIL, a first draft of a more complicated language
>>> that sat on top of RDF. The W3C MIT DARPA funding supported a “Semantic Web
>>> Advanced Development” activity that operated in the grey around of W3C’s
>>> “non member-funded activity”, and which served in particular to bring
>>> DAML+OIL into W3C as new work item. This next phase of RDF work at W3C was
>>> broadly in line with the RDF roadmap and expectations from the 1997
>>> Metadata Activity, but rebranded “Semantic Web” to reflect several
>>> considerations. Firstly that RDF was clearly more powerful and expressive
>>> than a simple metadata format might need. Secondly, by this point RDF was
>>> pretty unpopular in several contexts - and seen as draining staff resources
>>> and attention from W3C membership priorities (XML, Web Services, etc.).
>>> Renaming from RDF allowed a fresh start. Calling it Semantic Web tied into
>>> Tim-BL’s interest and writing in the area, had more “visionary” feel,
>>> allowing for a message that it was a longer term investigation, therefore
>>> not a competitor to XML Schema, SOAP, Xquery and so on. So now we had PICS
>>> and MCF having mutated into RDF/S for graph data, and then simultaneously a
>>> rebranding of the exercise as Semantic Web, with a big dose of “futuristic”
>>> and “researchy”. Conferences and journals and such started to appear,
>>> initially with much more focus on the “semantics” part, rather than the
>>> “web”. This was the cause for the second great half-hearted renaming, which
>>> grew from the growing split between those of us who were in this for
>>> web-based data sharing, integration, feeds, sitemaps, rss, foaf etc and so
>>> on, and those who were more “semantics first”, with a passion for finding
>>> efficient subsets of Description Logic. Around the mid-2000s the earlier
>>> experimental RDF query languages solidified into SPARQL, which was broadly
>>> in the “data access” side of the community. This is another place that the
>>> Cyc and MCF heritage showed up, since most practical RDF systems had a
>>> notion of source or context attached at the triple of graph level,
>>> corresponding to the notion of “layers” in MCF (and very loosely with cyc
>>> contexts). So this kind of takes us to the time when we had rdf/s, owl,
>>> skos, sparql … and things like dbpedia and the lod cloud were refining the
>>> data-linking “hypertext rdf” work we’d started in the FOAF project, with a
>>> TimBL-fueled passion for every entity being given a URI that can serve up
>>> RDF when dereferenced. A good amount of public open datasets were published
>>> this way, although applications and usage tended to lag. This brings us to
>>> the era of rich snippets, Google acquiring Freebase, renaming it Knowledge
>>> Graph and then stepping back from the role that Wikidata was more effective
>>> at filling…
>>>
>>>
>>> Ok that was a giant biased brain dump, but i think mostly true, and
>>> about 25 years underdocumented history squeezed into a paragraph
>>>
>>> Dan
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> Jan
>>>>
>>>> Am Fr., 23. Juli 2021 um 01:57 Uhr schrieb Denny Vrandečić <
>>>> [email protected]>:
>>>>
>>>>> Hi Thad,
>>>>>
>>>>> Thanks for asking the questions, and thanks Tobi for the pointers.
>>>>> Man, what a lengthy post it was.
>>>>>
>>>>> I understand that the post answered most of your questions. I think
>>>>> that it is entirely possible to layer a prototype semantics over Wikidata,
>>>>> just as the DL semantics have been layered over it. I don't remember if
>>>>> such work has been done before.
>>>>>
>>>>> Regarding ISO 5964, I think I probably have looked through it at some
>>>>> point, but I don't remember it anymore. SKOS has certainly been a stronger
>>>>> influence, and obviously OWL.
>>>>>
>>>>> I hope that helps with the historical deep dive :) Lydia and I really
>>>>> should write that book!
>>>>>
>>>>> Cheers,
>>>>> Denny
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Jul 10, 2021 at 3:00 PM Thad Guidry <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> *Tobi - *That blog post 3 is very helpful.  It shows that Denny and
>>>>>> I think alike and agree on everything. :-)  His dislike for strong
>>>>>> classification.
>>>>>> Which is part of my basis, to allow weak relations much more.  And
>>>>>> use them.  But how to allow them, and I think the only way is through
>>>>>> properties based on the Data Model currently.
>>>>>> There are many ways, and SKOS is one way to allow expressing weak
>>>>>> relations and we already have some good support with existing properties
>>>>>> like P4390 mapping relation type
>>>>>> <https://www.wikidata.org/entity/P4390> and a host of others.
>>>>>>
>>>>>> Denny and I also fear the same things, like not having a flexible
>>>>>> enough system to describe our complex world that doesn't always fit into
>>>>>> strict rules.  Which is kinda why I've always liked
>>>>>> https://www.w3.org/TR/skos-primer/#secassociative
>>>>>> because of it's non-transitivity which allows much flexibility and as
>>>>>> he and I would say... avoid "Barbara". :-)
>>>>>> Which is pretty much summarized in
>>>>>> https://www.w3.org/TR/skos-primer/#secadvanced
>>>>>>
>>>>>> Sorry for all the SKOS links but semantic relations helps to describe
>>>>>> human knowledge.  How a system represents or portrays semantic relations 
>>>>>> is
>>>>>> where choices are made or have been made.  *And I think the right
>>>>>> choices were definitely made.*
>>>>>> Overlaying SKOS and the Wikidata properties that sprinkle it into the
>>>>>> data model is useful, but I've always been kind of reluctant to do
>>>>>> that...probably for the same reasons Denny might give?  Choices between
>>>>>> allowing "semantic accuracy" versus "semantic flexibility".  But I think
>>>>>> systems like SKOS provide both.  Perhaps it could be argued that OWL
>>>>>> provides much less. :-)  Still all KOSs provide great use when they fit
>>>>>> well.  How they can fit over Wikidata, as I said, is probably only 
>>>>>> through
>>>>>> properties at this late stage of design and that's fine with me!
>>>>>>
>>>>>> Still, my main focus is and always will be trying to add human
>>>>>> knowledge about concept relations into Wikidata to help machines, to help
>>>>>> us.  (the "edges" that humans quickly can deduce in seconds, but still to
>>>>>> this day can sometimes take machines days or weeks to figure out).
>>>>>>
>>>>>> My usage and help to Abstract Wikipedia and Wikidata later on will
>>>>>> primarily be around the mapping of relations ... where a lot of the
>>>>>> possibilities have already been described years and years ago at the very
>>>>>> bottom of this long page:
>>>>>> *inter-KOS mapping relationships  <-- *very last row, 3rd column
>>>>>> https://www.w3.org/TR/skos-primer/#seccorrespondencesISO
>>>>>>
>>>>>>
>>>>>> *Denny - * were you part of or lightly influenced by ISO 5964
>>>>>> through Germany ISO DIN or not .. that also would be good to know.
>>>>>>
>>>>>> Thad
>>>>>> https://www.linkedin.com/in/thadguidry/
>>>>>> https://calendly.com/thadguidry/
>>>>>>
>>>>>>
>>>>>> On Sat, Jul 10, 2021 at 3:17 PM Tobi Gritschacher <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> It would be nice to have a place to look with a link to a page in
>>>>>>>> the Community portal that says "History of Wikidata's design and early
>>>>>>>> collected meetings, notes, design documents, recordings"
>>>>>>>>
>>>>>>>
>>>>>>> Might not answer your concrete question, but here are some (very)
>>>>>>> early blog posts by Denny. They are still a nice read. :)
>>>>>>>
>>>>>>> 1/3
>>>>>>> https://blog.wikimedia.de/2013/02/22/restricting-the-world/
>>>>>>>
>>>>>>> 2/3
>>>>>>> https://newwwblog.wikimedia.de/2013/06/04/on-truths-and-lies/
>>>>>>>
>>>>>>> 3/3
>>>>>>> https://blog.wikimedia.de/2013/09/12/a-categorical-imperative/
>>>>>>>
>>>>>>> Cheers, Tobi
>>>>>>> _______________________________________________
>>>>>>> Wikidata mailing list -- [email protected]
>>>>>>> To unsubscribe send an email to [email protected]
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Wikidata mailing list -- [email protected]
>>>>>> To unsubscribe send an email to [email protected]
>>>>>>
>>>>> _______________________________________________
>>>>> Wikidata mailing list -- [email protected]
>>>>> To unsubscribe send an email to [email protected]
>>>>>
>>>>
>>>>
>>>> --
>>>> Jan Dittrich
>>>> UX Design/ Research
>>>>
>>>> Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
>>>> <https://www.google.com/maps/search/Tempelhofer+Ufer+23-24+%7C+10963+Berlin?entry=gmail&source=g>
>>>> Tel. (030) 219 158 26-0
>>>> https://wikimedia.de
>>>>
>>>> Unsere Vision ist eine Welt, in der alle Menschen am Wissen der
>>>> Menschheit teilhaben, es nutzen und mehren können. Helfen Sie uns dabei!
>>>> https://spenden.wikimedia.de
>>>>
>>>> Wikimedia Deutschland — Gesellschaft zur Förderung Freien Wissens e. V.
>>>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
>>>> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
>>>> Körperschaften I Berlin, Steuernummer 27/029/42207.
>>>> _______________________________________________
>>>> Wikidata mailing list -- [email protected]
>>>> To unsubscribe send an email to [email protected]
>>>>
>>> _______________________________________________
>>> Wikidata mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>>
>>
>>
>> --
>> Samuel Klein          @metasj           w:user:sj          +1 617 529 4266
>> _______________________________________________
>> Wikidata mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>>
> --
> David McDonell Co-founder & CEO ICONICLOUD, Inc. "Illuminating the cloud"
> M: 703-864-1203 EM: [email protected] URL: http://iconicloud.com
> _______________________________________________
> Wikidata mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>

_______________________________________________
Wikidata mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Wikidata] Re: History of some original Wikidata design decisions?

Reply via email to