Re: [Wikidata-l] My WMF IEG proposal: data browser for Wikidata, etc.
Unfortunately my SOLRSearch proposal was deemed ineligible [5] the reason given that extensions require code review from WMF to deploy so I hope yours is more successful. -john On 14.02.2013 07:14, Yaron Koren wrote: Hi everyone, If you hadn't heard, the Wikimedia Foundation has started the first round of what is planned to be be a twice-annual program to fund individuals and small groups to work on projects that further the WMF's goals - individual engagement grants: http://meta.wikimedia.org/wiki/Grants:IEG [2] I have a proposal currently in place for a project that relates to Wikidata. It's a Javascript data browser, that would provide a drill-down interface to let users navigate through the data of Wikipedia, Wikidata, and any standard MediaWiki-based wiki, without overloading the server with queries. This would be accomplished by using browser-based data storage, which would also enable fast navigation on mobile devices, and even offline use. The downside is that it would mean, for the case of Wikipedia/Wikidata, that it could only store a small fraction of the overall data; but there could be multiple apps for different subject matters - one for cars, one for buildings, one for politicians, and so on. You can read my proposal here: http://meta.wikimedia.org/wiki/Grants:IEG/MediaWiki_data_browser [3] Any feedback is welcome, here or on the proposal talk page; and there's a section at the bottom for endorsements, if you would like to see this project happen. Thanks, Yaron -- WikiWorks · MediaWiki Consulting · http://wikiworks.com [4] ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l [1] Links: -- [1] https://lists.wikimedia.org/mailman/listinfo/wikidata-l [2] http://meta.wikimedia.org/wiki/Grants:IEG [3] http://meta.wikimedia.org/wiki/Grants:IEG/MediaWiki_data_browser [4] http://wikiworks.com [5] http://meta.wikimedia.org/wiki/Grants_talk:IEG/SOLRSearch ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Reification in Wikidata serialisation
Hi - Where can i see the rdfs/owl definition for o:Statement? thanks - john On 01.02.2013 12:09, Daniel Kinzler wrote: On 01.02.2013 14:54, Nicholas Humfrey wrote: While the reification makes sense, we thought that it looked a bit too much like rdf:Statement. w:Berlin s:Population Berlin:Statement1 . Berlin:Statement1 rdf:type o:Statement . Perhaps you could rename o:Statement to o:Fact instead? But it's not a fact. It's a claim someone makes. That may seem like a fine distinction, but it's really fundamental to understanding how Wikidata/Wikibase is different from DBpedia, Freebase, Cyc, etc. Wikidata doesn't collect facts. It collects statements (sourced claims). -- daniel ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Reification in Wikidata serialisation
never mind - the previous note had a link. thanks On 01.02.2013 16:30, jmccl...@hypergrove.com wrote: Hi - Where can i see the rdfs/owl definition for o:Statement? thanks - john On 01.02.2013 12:09, Daniel Kinzler wrote: On 01.02.2013 14:54, Nicholas Humfrey wrote: While the reification makes sense, we thought that it looked a bit too much like rdf:Statement. w:Berlin s:Population Berlin:Statement1 . Berlin:Statement1 rdf:type o:Statement . Perhaps you could rename o:Statement to o:Fact instead? But it's not a fact. It's a claim someone makes. That may seem like a fine distinction, but it's really fundamental to understanding how Wikidata/Wikibase is different from DBpedia, Freebase, Cyc, etc. Wikidata doesn't collect facts. It collects statements (sourced claims). -- daniel ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l [1] Links: -- [1] https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Data values
(if i knew the private email for Denny, I'd send this there) Martynas, there is no mention here of XSD etc. because it is not relevant on this level of discussion. For exporting the data we will obviously use XSD datatypes. This is so obvious that I didn't think it needed to be explicitly stated. Maybe you don;'t realize snarky comments such as the last sentence are infinitely tiresome. It certainly demonstrates your lack of understanding what Martynas is saying: build on others work to the extent that you can and, if you can't, explain why not. On 21.12.2012 09:14, Denny Vrandečić wrote: Hi all, wow! Thanks for all the input. I read it all through, and am trying to digest it currently into a new draft of the data model for the discussed data values. I will try to adress some questions here. Please be kind if I refer the wrong person at one place or the other. Whenever I refer to the current model, I mean the version as it was during this discussion http://meta.wikimedia.org/w/index.php?title=Wikidata/Development/Representing_valuesoldid=4859586 [2] The term updated model refers to the new one, which is not published yet. I hope I can do that soon. == General comments == I want to remind everyone of the Wikidata requirements: http://meta.wikimedia.org/wiki/Wikidata/Notes/Requirements [3] Here especially: * The expressiveness of Wikidata will be limited. There will always be examples of knowledge that Wikidata will not be able to convey. We hope that this expressiveness can increase over time. * The first goal of Wikidata is to serve actual use cases in Wikipedia, not to enable some form of hypothetical perfection in knowledge representation. * Wikidata has to balance ease of use and expressiveness of statements. The user interface should not get complicated to merely cover a few exceptional edge cases. * What is an exceptional case, and what is not, will be defined by how often they appear in Wikipedia. Instead of anecdotal evidence or hypothetical examples we will analyse Wikipedia and see how frequent specific cases are. In general this means that we cannot express everything that is expressible. A statement should not be intended to reflect the source as close as possible, but rather to be *supported* by the source. I.e. if the source says He died during the early days of 1876 this would also support a statement like died in - 19th century. It does not have to be more exact than that. Martynas, there is no mention here of XSD etc. because it is not relevant on this level of discussion. For exporting the data we will obviously use XSD datatypes. This is so obvious that I didn't think it needed to be explicitly stated. Tom, thanks for the links to EDTF and the Freebase work, this was certainly very enlightening. Friedrich, the term query answering simply means the ability to answer queries against the database in Phase 3, e.g. the list of cities located in Ghana with a population over 25,000 ordered by population. A query system that deals well with intervals -- I would need a pointer for that. For now I was always assuming to use a single value internally to answer such queries. If the values is 90+-20 then the query 100? would not contain that result. Sucks, but I don't know of any better system. We do not anywhere rely on floats (besides in internal representations), but always use decimals. Floats have some inherent problems in representing some numbers that could be interesting for us. == Time == Marco suggested to N/A some values of dates. This is partially the idea of the precision attribute in the current data. Anything below the precision would be N/A. It would not be possible to N/A the year when the month or day is known though, as Friedrich suggested. Friedrich also suggested to use a value like April-July 1567 for uncertain time instead of the current precision model. I prefer his suggestion to the current one and will include that in the updated model. The accuracy though has to be in the unit given by the precision, we cannot just take seconds, since there is no well-defined number of seconds in a month or a year, or, almost anything, actually. Note though that the intervals that Sven mentioned -- useful for e.g. reigns or office periods -- are different beasts and should have uncertainty entries both for the start and end date. We have intervals in the data model, and plan to implement them later -- it is just that they are not such a high priority (dates appear 2.5 Million times in infoboxes, intervals only 80,000 times). I am completely unsure what to do with a value like about 1850 if not to interpret it at as something like 1850 +- 50, but Sven seems to dislike that. == Location == After the discussion, I decided to drop altitude / elevation from the Geolocation. It can still be expressed through a property, and have all the flexibility of a normal property (including qualifiers etc.)
Re: [Wikidata-l] Data values
The xsd:minInclusive, xsd:maxInclusive, xsd:minExclusive and xsd:maxExclusive facets are absolute expressions not relative +/- expressions, in order to accommodate fast queries. These four facets permit specification of ranges with an unspecified median and ranges with a specified mode, inclusie or exclusive of endpoints, a six-fer. For these reasons I believe the XSD approach is superior for specifying value set when compared to storing the dispersion factors themselves, eg the 3 of +/- 3. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Data values
I detect a need to characterize the range expression - most important of which is whether the range is complete, or whether it excludes (equal) tails on each end. XSD presumes a complete range is being specified, not a subset, is the issue you're raising? Could an additional facet for percentage-tails-excluded effectively communicate this estimate? On 21.12.2012 10:41, Gregor Hagedorn wrote: On 21 December 2012 19:36, jmccl...@hypergrove.com wrote: The xsd:minInclusive, xsd:maxInclusive, xsd:minExclusive and xsd:maxExclusive facets are absolute expressions not relative +/- expressions, in order to accommodate fast queries. These four facets permit specification of ranges with an unspecified median and ranges with a specified mode, inclusie or exclusive of endpoints, a six-fer. For these reasons I believe the XSD approach is superior for specifying value set when compared to storing the dispersion factors themselves, eg the 3 of +/- 3. yes, provided they are actually tied to the semantics of min. and maximum, which the xsd examples are. As long as the semantics of the proposed value bracketing in Wikidata is unknown, their use is questionable if not impossible. If I know something is plus/minus 2 s.d. or plus minus 2 s.e. or 10 to 90 % percentile ... I again can use them to the benefit of the query system. But not without. Gregor ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l [1] Links: -- [1] https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Data values
(Proposal 3, modified) * value (xsd:double or xsd:decimal) * unit (a wikidata item) * totalDigits (xsd:smallint) * fractionDigits (xsd:smallint) * originalUnit (a wikidata item) * originalUnitPrefix (a wikidata item) JMc: I rearranged the list a bit and suggested simpler naming JMc: Is not originalUnitPrefix directly derived from originalUnit? JMc: May be more efficient to store not reconstruct the original value. May even be better to store the original value somewhere else entirely, earlier in the process, eg within the context that you indicate would be worthwhile to capture, because I wouldnt expect alot of retrievals, but you anticipate usage patterns certainly better than I. How about just: Datatype: .number (Proposal 4) - :value (xsd:double or xsd:decimal) :unit (a wikidata item) :totalDigits (xsd:smallint) :fractionDigits (xsd:smallint) :original (a wikidata item that is a number object) On 20.12.2012 03:08, Gregor Hagedorn wrote: On 20 December 2012 02:20, jmccl...@hypergrove.com wrote: For me the question is how to name the precision information. Do not the XSD facets totalDigits and fractionDigits work well enough? I mean Yes, that would be one way of modeling it. And I agree with you that, although the xsd attributes originally are devised for datatypes, there is nothing wrong with re-using it for quantities and measurements. So one way of expressing a measurement with significant digits is: (Proposal 1) * normalizedValue * totalDigits * fractionDigits * originalUnit * normalizedUnit To recover the original information (e.g. that the original value was in feet with a given number of significant digits) the software must convert normalizedUnit to originalUnit, scale to totalDigits with fractionDigits, calculate the remaining powers of ten, and use some information that must be stored together with each unit whether this then should be expressed using an SI unit prefix (the Exa, Tera, Giga, Mega, kilo, hekto, deka, centi, etc.). Some units use them, others not, and some units use only some. Hektoliter is common, hektometer would be very odd. This is slightly complicated by the fact that for some units prefix usage in lay topics differs from scientific use. If all numbers were expressed ONLY as total digits with fraction digits and unit-prefix, i.e. no power-of-ten exponential, the above would be sufficiently complete. However, without additional information it does not allow to recover the entry: 100,230 * 10^3 tons (value 1.0023e8, 6 total, 3 fractional digits, original unit tons, normalized unit gram) I had therefore made (on the wiki) the proposal to express it as: (Proposal 2) * normalizedValue * significantDigits (= and I am happy with totalDigits instead) * originalUnit * originalUnitPrefix * normalizedUnit However I see now that the analysis was wrong, indeed it needs fractionDigits in addition to totalDigits, else a similar problem may occur, i.e. the distribution of the total order of magnitude of the number between non-fractional digits, fractional digits, powers of 10 and powers-of-10-expressed through SI units is still not unambigous. So the minimal representation seems to be: (Proposal 3) * normalizedValue (xsd:double or xsd:decimal) * totalDigits (xsd:smallint) * fractionDigits (xsd:smallint) * originalUnit (a wikidata item) * originalUnitPrefix (a wikidata item) * normalizedUnit (a wikidata item) Adding the originalUnitPrefix has the advantage that it gathers knowledge from users and data creators or resources about which unit prefix is appropriate in a given context. I see the current wikidata plan to solve this problem by heuristics very critical, I do not see the data set that sufficiently tests the heuristics yet. Gathering information from data entered and creating a formatting heuristics modules over the coming years (instead of weeks) will be valuable for reformatting. The Proposal 3 allows to gather this information. Gregor Note 1: The question of other means to express accuracy or precision, e.g. by error margins, statistical measures of spread such as variance, confidence intervals, percentiles, min/max etc. is not yet covered. Given the present discussion, this should probably be separately agreed upon. Note 2: Wikipedia Infoboxes may desire to override it, this is for data entering, review, curation, and a default display where no other is defined ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l [1] Links: -- [1] https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Data values
I suspect what Martynas is driving at is that XMLS defines **FACETS** for its datatypes - accepting those as a baseline, and then extending them to your requirements, is a reasonable, community-oriented procss. However, wrapping oneself in the flag of open development is to me unresponsive to a simple plea to stand on the shoulders of giants gone before, to act in a responsible manner cognizant of the interests of the broader community. And personally I have to say I don't like the word clinging -- clearly a red flag meant to inflame if not insult. This is no place for that! On 19.12.2012 09:47, Sven Manguard wrote: My philosophy is this: We should do whatever works best for Wikidata and Wikidata's needs. If people want to reuse our content, and the choices we've made make existing tools unworkable, they can build new tools themselves. We should not be clinging to what's been done already if it gets in the way of what will make Wikidata better. Everything that we make and do is open, including the software we're going to operate the database on. Every WMF project has done things differently from the standards of the time, and people have developed tools to use our content before. Wikidata will be no different in that regard. Sven On Wed, Dec 19, 2012 at 12:27 PM, Martynas Jusevičius marty...@graphity.org wrote: Denny, you're sidestepping the main issue here -- every sensible architecture should build on as much previous standards as possible, and build own custom solution only if a *very* compelling reason is found to do so instead of finding a compromise between the requirements and the standard. Wikidata seems to be constantly doing the opposite -- building a custom solution with whatever reason, or even without it. This drives the compatibility and reuse towards zero. This thread originally discussed datatypes for values such as numbers, dates and their intervals -- semantics for all of those are defined in XML Schema Datatypes: http://www.w3.org/TR/xmlschema-2/ [1] All the XML and RDF tools are compatible with XSD, however I don't think there is even a single mention of it in this thread? What makes Wikidata so special that its datatypes cannot build on XSD? And this is only one of the issues, I've pointed out others earlier. Martynas graphity.org [2] On Wed, Dec 19, 2012 at 5:58 PM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: Martynas, could you please let me know where RDF or any of the W3C standards covers topics like units, uncertainty, and their conversion. I would be very much interested in that. Cheers, Denny 2012/12/19 Martynas Jusevičius marty...@graphity.org Hey wikidatians, occasionally checking threads in this list like the current one, I get a mixed feeling: on one hand, it is sad to see the efforts and resources waisted as Wikidata tries to reinvent RDF, and now also triplestore design as well as XSD datatypes. What's next, WikiQL instead of SPARQL? On the other hand, it feels reassuring as I was right to predict this: http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00056.html [3] http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00750.html [4] Best, Martynas graphity.org [2] On Wed, Dec 19, 2012 at 4:11 PM, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: On 19.12.2012 14:34, Friedrich Röhrs wrote: Hi, Sorry for my ignorance, if this is common knowledge: What is the use case for sorting millions of different measures from different objects? Finding all cities with more than 10 inhabitants requires the database to look through all values for the property population (or even all properties with countable values, depending on implementation an query planning), compare each value with 10 and return those with a greater value. To speed this up, an index sorted by this value would be needed. For cars there could be entries by the manufacturer, by some car-testing magazine, etc. I don't see how this could be adequatly represented/sorted by a database only query. If this cannot be done adequatly on the database level, then it cannot be done efficiently, which means we will not allow it. So our task is to come up with an architecture that does allow this. (One way to allow scripted queries like this to run efficiently is to do this in a massively parallel way, using a map/reduce framework. But that's also not trivial, and would require a whole new server infrastructure). If however this is necessary, i still don't understand why it must affect the datavalue structure. If a index is necessary it could be done over a serialized representation of the value. Serialized can mean a lot of things, but an index on some data blob is only useful for exact matches, it can not be used for greater/lesser queries. We need to map
Re: [Wikidata-l] .name = text property
Using the dotted notation, XSD datatype facets such as below can be specified easily as properties using a simple colon: Property: .anyType:equal - (sameAs equivaluent) redirect to page/object with actual numeric value Property: .anyType:ordered - a boolean property Property: .anyType:bounded - a boolean property Property: .anyType:cardinality - a boolean property Property: .anyType:numeric - a boolean property Property: .anyType:length - number of chars allowed for value Property: .anyType:minLength - min nbr of chars for value Property: .anyType:maxLength - max nbr of chars for value Property: .anyType:pattern - regex string Property: .anyType:enumeration - specified values comprising value space Property: .anyType:whiteSpace - reserve or replace or collapse Property: .anyType:maxExclusive - number for an upper bound Property: .anyType:maxInclusive - number for an upper bound Property: .anyType:minExclusive - number for an lower bound Property: .anyType:minInclusive - number for an lower bound Property: .anyType:totalDigits - number of total digits Property: .anyType:fractionDigits - number of digits in the fractional part of a number An anonymous object is used to represent namespace-qualified (text url) values eg_ rdf:about_: Property: .:rdf:about - this is a .url value for an RDF about property for a page/object Property: .:skos:prefLabel - this is a .name value for a page/object I suggest that properties for precision can be found in XSD facets above. - john On 19.12.2012 12:41, jmccl...@hypergrove.com wrote: Here's a suggestion. Property names for numeric information seem to be on the table -- these should be viewed systematically not haphazardly. If all text properties had a dotted lower-case name, life would be simpler in SMW land all around and maybe Wikidata land too. All page names have an initial capital as a consequence of requiring all text properties to be named with an initial period followed by a lower-case letter. The SMW tool mandates the properties from which all derive: .text, .string and .number are basic (along with others like .page). Then, strings have language-based subproperties and number expression subproperties, and numbers have XSD datatype subpropertiess, which in turn have SI unit type subproperties, and so on. Here's a Consolidated Listing of ISO 639, ISO 4217, SI Measurement Symbols, and World Time Zones [2] [1] to illustrate that it is possible to create a unified string- numeric-type property name dictionary across a wide swath of the standards world. The document lists a few overlapping symbols then re-assigned to another symbol. Adopting a dotted name text-property naming convention, can segue to easier user interfaces too for query forms at least plus impacts exploited by an SMW query engine. What is meant by these expressions seems pretty natural to most people: Property: Height - the value is a wiki pagename or objectname for a height numeric object Property: .text - (on Height) the value is text markup associated with the Height object Property: .string - (on Height) the value is text non-markup data for the Height object Property: .ft - (on Height) the value is number of feet associated with the Height object Property: Height.text - the value is text markup associated with an anonymous Height object Property: Height.string - the value is a string property of an anonymous Height object Property: Height.ft - the value is a feet property of an anonymous Height object [1] http://www.hypergrove.com/Publications/Symbols.html ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l [1] Links: -- [1] https://lists.wikimedia.org/mailman/listinfo/wikidata-l [2] http://www.hypergrove.com/Publications/Symbols.html ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Data values
totally agree - hopefully XSD facets provide a solid start to meeting those concrete requrements - thanks. On 19.12.2012 14:09, Gregor Hagedorn wrote: On 19 December 2012 20:01, jmccl...@hypergrove.com wrote: Hi Gregor - the root of the misconception I likely have about significant digits and the like, is that such is one example of a rendering parameter not a semantic property. It is about semantics, not formatting. In science and engineering, the number of significant digits is not used to right align numbers, but to semantically indicate the order of magnitude of the accuracy and/or precision of a measurement or quantity. Thus, the weight of a machine can be given as 1.2 t (exact to +/- 50 kg), 1200 kg (+/- 1 kg), or 1200.000 g. This is not part of IEEE floating point numbers, which always have the type dependent same precision or number of significant digits, regardless whether this is semantically justified or not. IEEE 754 standard double always has about 16 decimal significant digits, i.e. the value 1.2 tons will always be given as 1.200 tons. This is good for calculations, but lacks the information for final rounding. Gregor ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l [1] Links: -- [1] https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Data values
For me the question is how to name the precision information. Do not the XSD facets totalDigits and fractionDigits work well enough? I mean .number:totalDigits contains a positive power of ten for precision .number:fractionDigits contains a negative power of ten for precision The use of the word datatype is always interesting as somehow it's meant organically different from the measurement to which it's related. Both are resources with named properties - what are those names? Certain property names derived from (international standards) should be considered builtin to whatever foundation the implementing tool procides. I suggest that XSD names be used at least for concepts that appear to be the same, with or without the xsd: xml-namespace prefix. But the word datatype fascinates me even more ever since SMW internalized the Datatype namespace. Because to me RDF made an error back when the rdf:type property got the range Class, when it should have been Datatype (though politics got in the way!) It gets more twisted, as now Category is the chosen implementation of rdfs:Class. The problem that presents is that categories are lists and a class (that is, rdf:type value) is, for some singular, and for others a plural, concept or label. Pure semantic mayhem. I'm happy SMW internalized the datatype namespace to the extent it maps to its software chiefly because it clarifies that a standard Type namespace is needed -- which contains singular noun phrases -- which is the value range for rdf:type (if you will) properties. All Measurement types (eg Feet, Height Lumens) would be represented there too, like any other class, with its associated properties that (in the case of numerics) would include .totalDigits and .fractionDigits. Going this route -- establishing a standard Type namespace -- would allow wikis to have a separate vocabulary of singular noun phrases not in the Category namespace. The ultimate goal is to associate a given Type to its implemention as a wiki namespace, subpage or subobject; the Category namespace itself is already overloaded to handle that task. -john On 19.12.2012 14:50, Gregor Hagedorn wrote: totally agree - hopefully XSD facets provide a solid start to meeting those concrete requrements they don't. They allow to define derived datatypes and thus apply to the datatype, not the measurement. Different measurements of the same datatype may be of different precision. --gregor ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l [1] Links: -- [1] https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] qudt ontology facets
The NIST ontology defines 4 basic classes that are great: _qudt:QuantityKind [11]_, _qudt:Quantity [12]_, _qudt:QuantityValue [13]_, _qudt:Unit [14]_ but the properties set leaves me a bit thirsty. Take Area as an example. I'd like to reference properties named .ft2 and .m2 so that, for instance, an annotation might be [[Leasable area.ft2::12345]]. To state the precision applicable to that measurement, might be [[Leasable area.ft2:fractionDigits :: 0]] to indicate say, rounding. However, in the NIST ontology, there is no ft2 property at all -- this is an SI unit though, so it seems identifying first the system of measurement units, and then the specific measurement unit is not a great idea because these notations are then divorced from the property name itself, a scenario guaranteed to cause more user errors omissions I think. Someone's mentioned uncertainty facets, so I suggest these from the qudt ontology: Property: .anyType:relativeStandardUncertainty Property: .anyType:standardUncertainty Other facets noted might include Property: .anyType:abbreviation Property: .anyType:description Property: .anyType:symbol -john On 19.12.2012 08:10, Herman Bruyninckx wrote: On Wed, 19 Dec 2012, Denny Vrandečić wrote: Martynas, could you please let me know where RDF or any of the W3C standards covers topics like units, uncertainty, and their conversion. I would be very much interested in that. NIST has created a standard in OWL: QUDT - Quantities, Units, Dimensions and Data Types in OWL and XML: http://www.qudt.org/qudt/owl/1.0.0/index.html [5] I fully share Martynas' concerns: most of the problems that are being discussed in this thread (and that are very relevant and interesting) should not be solved with an object oriented approach (that is, via properties of objects, and inheritance) but by semantic modelling (that is, composition of knowledge). For example, one single data base representation of a unit can have multiple displays depending on who wants to see the unit, and in which context; the viewer and the context are rather simple to add via semantic primitives. For example, the Topic Map semantic standard would fit here very well, in my opinion: http://en.wikipedia.org/wiki/Topic_map [6]. Cheers, Denny Herman http://people.mech.kuleuven.be/~bruyninc Tel: +32 16 328056 Vice-President Research euRobotics http://www.eu-robotics.net [7] Open RObot COntrol Software http://www.orocos.org [8] Associate Editor JOSER http://www.joser.org [9], IJRR http://www.ijrr.org [10] ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l [1] Links: -- [1] https://lists.wikimedia.org/mailman/listinfo/wikidata-l [2] http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00056.html [3] http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00750.html [4] http://wikimedia.de [5] http://www.qudt.org/qudt/owl/1.0.0/index.html [6] http://en.wikipedia.org/wiki/Topic_map [7] http://www.eu-robotics.net [8] http://www.orocos.org [9] http://www.joser.org [10] http://www.ijrr.org [11] http://www.qudt.org/qudt/owl/1.0.0/qudt/index.html#QuantityKind [12] http://www.qudt.org/qudt/owl/1.0.0/qudt/index.html#Quantity [13] http://www.qudt.org/qudt/owl/1.0.0/qudt/index.html#QuantityValue [14] http://www.qudt.org/qudt/owl/1.0.0/qudt/index.html#Unit ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Schema.org markup in Wikidata concepts.
Hi Max - why do you say now that Wikidata breaks the assumption that pages store wikitext ? Thanks On 14.09.2012 08:06, Klein,Max wrote: Hello all, I was wondering, now that Wikidata breaks the assumption that pages store wikitext, has it been considered that Wikidata concept pages could actually use schema.org markup? Wikidata could possibly read in data from other web pages which use schema.org markup, but what about the Wikidata pages themselves having the markup? Max Klein Wikipedia in Residence kle...@oclc.org +17074787023 ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] weekly summary #23
Hi Lydia - could you elaborate about this item in your excellent report!. * Added Type entity type, plus skeletons for its associated Content, ContentHandler, ViewAction, EditAction and UndoAction I'm particularly keen to know how this facility relates to fielding application-level ontologies. thanks - john On 14.09.2012 07:40, Lydia Pintscher wrote: Heya folks :) Here's what's been happening over the last week. (wiki version at http://meta.wikimedia.org/wiki/Wikidata/Status_updates/2012_09_14) = Development = * Made new CreateItem special-page (also working JavaScript-less) * Special:ItemDisambiguation got lots of love and awesome autocompletion * Special:ItemByTitle also got lots of love and awesome autocompletion * The client-wiki now gets notified if a connecting Sitelink gets removed * Wrote Selenium tests for client-code * Editing/adding site-links will display a proper link to the page again * Removed auto-expansion for description/label input fields * Tested setting Mediawiki to use HTML5 to make sure it works as it should * Finished up work on new sites functionality in Wikibase and moved it as a patch to core (which is still awaiting review) * Worked on ValueHandler extension which will be used for our data values * Added Type entity type, plus skeletons for its associated Content, ContentHandler, ViewAction, EditAction and UndoAction * Implemented safeguards against text-level editing of data pages * Allow Sitelinks to Wikipedia only (fixed regression) * Wrote permission checks and edit conflict detection for ApiSetItem, undo/restore, etc. * Fixed display of deleted revisions of data items * Added --verbose option to pollForChanges maintenance script to show change summary * Bug fixes and improvements for right-to-left languages * Updated demo system http://wikidata-test.wikimedia.de/ * The long format (more like the json output format) for wbsetitem API module is now alive (http://www.mediawiki.org/wiki/Extension:Wikibase/API#New_long_format) See http://meta.wikimedia.org/wiki/Wikidata/Development/Current_sprint for what we're working on next. You can follow our commits at https://gerrit.wikimedia.org/r/#/q/(status:open+project:mediawiki/extensions/Wikibase)+OR+(status:merged+project:mediawiki/extensions/Wikibase),n,z and view the ones awaiting review at https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/extensions/Wikibase,n,z = Discussions/Press = * Sent note about Wikidata to all Wikimedia projects via Global Message Delivery - generated quite some feedback on Meta * Started page to coordinate discussions about bots around Wikidata: http://meta.wikimedia.org/wiki/Wikidata/Bots = Events = see http://meta.wikimedia.org/wiki/Wikidata/Events * State of the Map * Health 2.0 Berlin meetup * upcoming: Software Freedom Day = Open Tasks for You = * New stuff to hack on: https://bugzilla.wikimedia.org/buglist.cgi?keywords=need-volunteerkeywords_type=allwordsresolution=---resolution=LATERresolution=DUPLICATEquery_format=advancedcomponent=WikidataClientcomponent=WikidataRepoproduct=MediaWiki%20extensionslist_id=145856 Anything to add? Please share! :) Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Schema.org markup in Wikidata concepts.
Thanks for the link. So Max's original question was whether a content-type'd page containing (json-? xml-? rdf-?) -encoded data, can have markup conforming to schema.org ontologies? Seems that if xml/rdf is a supported content type then an opening does exist for such to be implemented however only by individual wikipedia sites; because it seems clear that wikidata will not process such rdf/xml in order to create wikidata statements that can be displayed within infoboxes. It does raise the question of whether wikidata defines a content type for its json markup, that is, json markup that conforms to its SNAKs ontology? Is there a list of the content types somewhere? I looked but could not find one in ContentHandler.php. thanks - john On 14.09.2012 11:00, Jeremy Baron wrote: On Fri, Sep 14, 2012 at 5:56 PM, jmccl...@hypergrove.com wrote: Hi Jeremy and Max, [...] Would you please provide a link or pointer to documentation about these types? Maybe some of these questions are answered at https://www.mediawiki.org/wiki/ContentHandler -Jeremy ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Schema.org markup in Wikidata concepts.
Your response doesn't direct me to a list of content types (supported formats). But you know I'd suggest the answer to your question wrt schema.org, is found in the published RDF [1]: w:Berlin s:Population Berlin:Statement1 . Berlin:Statement1 rdf:type o:Statement . Note that we introduced two new properties, s:Population (s: like statement) and v:Population (v: like value) instead of the original p:Population property (mind the namespaces). The original property, in the p: namespace, is a datatype property that connects items directly with an integer value (i.e. the domain is item). The s: property, on the other hand, is an object property connecting the item with the statement about it. [1] [1] https://meta.wikimedia.org/wiki/Wikidata/Development/RDF#Statements_with_qualifiers The problem is this: s:Population is not published that I can find. So is the question will s:Population be mapped to some schema.org/Population rdf:Property definition, if there were such a thing defined at schema.org? But looking at schema.org, population is usually defined as a p:Population kind-of-thing, not an s:Population kind-of-thing. In wikidata, p:Population is an rdf:Datatype, s:Population an rdf:Property. Schema.org's population is a datatype, not a class in an ontology, so they're not fitting together. But as I've said, it's a problem that s:Population is not defined. Is s:Population not closer to a collection, a bag, that has restricted content that is, a bag of statements (see quotation above), than it is to a functional definition of Population as something more like a group of living individuals, alive as per some specified context? Where does THIS concept of population fit with rdf:Property rdf:about=wikidata/s:Population/ ? Hard to avoid that s:Population basically is just short-hand for Population Statements. Using shorthand certainly has precedence, being that q:Qualifier is short-hand for a category, an owl:Class, and thus q:Draft is (very acceptable) short-hand for Draft Things... So we need a crisp definition of s:Population please! Thanks - john On 14.09.2012 11:47, Jeremy Baron wrote: On Fri, Sep 14, 2012 at 6:37 PM, jmccl...@hypergrove.com wrote: I looked but could not find one in ContentHandler.php. thanks - john The links at the bottom of the page I just linked to work for me. -Jeremy ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata, ISO and chembox
Hi Nadja - To my knowledge ISO has not published, nor is intending to publish, instances of topic maps representing the content of their numerous publications, using either their (ISO's) standard for Topic Maps (ISO/IEC 13250), or any other ISO or non-ISO standard. Forgive me if I ever gave that impression. You provided a nice link to unofficial topic map standards, thank you. Here's others: http://www.garshol.priv.no/download/tmlinks.html [3]. Theoretically W3C is engaged in mapping Topic Maps to an RDF representation, but I've not seen the fruits of that (important!) work. I hope when such is published all questions about copyrights will be concommitantly resolved. Regards - john On 05.09.2012 03:17, Nadja Kutz wrote: John McClure wrote: Nadja conflated our asking about ISO Topic Maps as a base design standard with incorporating ALL ISO STANDARDS EVER PUBLISHED into the wikidata database their publications are topic maps? Because there exists a ISO Topic Map metamodel? wikidata people have changed their min with RDF that is I think it may just be quite a bit more messy moreover my impression is that there is more RDF linked da ud (see e.g. moreover I didn't say to use ALL ISO STANDARDS EVER PUBLISHED but suggested to use these 2px solid; margin-left:5px; width:100%http://article.gmane.org/gmane.org.wikimedia.wikidata/576 [1] s doesnt exclude that one could use in the end all ISO standards ever published, but one could do so incrementally. rephrase it and ask again? (The reference to previous email and link px; border-left:#1010ff 2px solid; margin-left:5px; width:100%I restate the questions of my posting: cle.gmane.org/gmane.org.wikimedia.wikidata/576http://article.gmane.org/gmane.org.wikimedia.wikidata/576the ISO Standard phone number: http://www.iso.org/iso/copyright.htm [2] Links: -- [1] http://article.gmane.org/gmane.org.wikimedia.wikidata/576 [2] http://www.iso.org/iso/copyright.htm [3] http://www.garshol.priv.no/download/tmlinks.html ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata, ISO and chembox
Hi Denny - your statement, that SNAKS can be related to RDF or topic maps, is interesting to me, particularly your reference to topic maps. I tend to interpret this as saying you believe SNAKS implements the topic map data model, represented using RDF triples, that SNAKS is informed by or itself anticipates the W3's RDF mapping for Topic Maps. At some leisurely moment -- to the extent you have any! -- please flesh that out. Thanks- john On 05.09.2012 04:01, Denny Vrandečić wrote: The discussion in this thread so far has centered around the data model that is described here: http://meta.wikimedia.org/wiki/Wikidata/Data_model [1] This data model relates to RDF or topic maps. Links: -- [1] http://meta.wikimedia.org/wiki/Wikidata/Data_model ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata, ISO and chembox
Luca, You're right, and I apologize that I steered the discussion from content classification back to an old wikidata data modelling question -- SNAKS vs Topic Maps. Because I am ignorant about any ISO classification standard whatsoever I thought the old bugaboo modelling discussion was being resurrected, but I was wrong. On 05.09.2012 06:35, Luca Martinelli wrote: Dear all, sorry but I think I didn't correctly got the point of the whole thing. Probably, I was overestimating my English competence, or my free-licensing competence, or both. So, without ANY intention of being rude, or even polemical, I would like to ask: what is this discussion about, again? If I got it right, someone expressed his/her doubts about using ISO standards in classifying data on Wikidata because of [this point may be challenged, but this is what I understood] potential ISO copyright issues. Now, the points are: a) Is my guess correct? If no, what is the point this discussion is about? b) Is there anyone who could answer this doubt, whatever it is? Just trying to follow this thread, nothing more. Thank you. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata, ISO and chembox
Nadja, "Why is the topic map standard at http://www.topicmaps.org/standards/an unofficial topic map standard?" Links can be found to working technical committee reports, which are generally re-titled as "standards" once voted and accepted by ISO. These TC reports vary little from what is published by the ISO community as a "standard". "ISO/IEC 19788-1:2011 is information-technology-neutral" means that information elements defined by the standard for learning resources can be represented using a concrete implementation syntax such as provided by the RDF or by Topic Maps or by SMW syntax or by JSON or by a variety of others. For instance, a data element defined by 19788 might be "Learning Resource" which has an attribute called "Minimum Education Level". In RDF a given LearningResource would be exchanged viardf:RDF xmlns:iso="http://iso.org/19788/" xmlns:dc="dublin core"iso:LearningResource rdf:about="LR12345" dc:titleLithium/dc:title iso:minEducLevel12th grade/iso:minEducLevel/iso:LearningResource/rdf:RDFThe owl:Class definition for iso:LearningResource and rdf:Property definition for iso:minEducLevel are not shown. In Topic Maps, exchange is had fromtm:topicmap xmlns:tm="http://iso.org/13250/" xmlns:dc="dublin core"tm:topic tm:itemIdentityLR12345/tm:itemIdentity tm:name tm:valueLithium/tm:value /tm:name tm:occurrence tm:type tm:resourceRef href=""/ /tm:type tm:resourceData12th grade/tm:resourceData /tm:occurrence/tm:topic/tm:topicmapMarkup for a "published subject identifier" -- eg http://iso.org/19788/MinimumEducationLevel -- is not shown. In SMW, the page named "Lithium" might be exported aspage titleLithium/title revision text xml:space="preserve"[[Category:ISO 19788 Learning Resources]][[minEducLevel::12th grade]] /text /revision/pagewhile a page entitled "Category:ISO 19788 Learning Resources" is mapped elsewhere in the wiki to an import-able owl:Class definition and a page entitled "Property:minEducLevel" is mapped to an import-able rdf:Property definition. It'd be interesting to hear from the Wikidata team what the SNAK API serialization(s) would be. John On 05.09.2012 09:45, Nadja Kutz wrote: John McClure wrote: "To my knowledge ISO has not published, nor is intending to publish, instances of topic maps representing the content of their numerous publications, using either their (ISO's) standard for Topic Maps (ISO/IEC 13250), or any other ISO or non-ISO standard. Forgive me if I ever gave that impression. You provided a nice link to unofficial topic map standards, thank you. Here's others: http://www.garshol.priv.no/download/tmlinks.html." thanks for the link. Why is the topic map standard at http://www.topicmaps.org/standards/an unofficial topic map standard? They talk about that it is ISO 13250 standard. You had drawn my attention to the ISO, thanks for that however the impression that they might have some standards is from their website:http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=50772 where they write amongst others: "The primary purpose of ISO/IEC 19788 is to specify metadata elements and their attributes for the description of learning resources. This includes the rules governing the identification of data elements and the specification of their attributes." since this thing costs 162 CHF i can't check what those guys are really doing there, however it would be strange to define metadata which is not for automated processing. In particular they write a little later: "ISO/IEC 19788-1:2011 is information-technology-neutral and defines a set of common approaches, i.e. methodologies and constructs, which apply to the development of the subsequent parts of ISO/IEC 19788." would they write this ifISO/IEC 19788 was information-technology-neutral as well? moreover there exists also a N2448 Summary of voting on ISOIEC NP 18343, Learning environment profile for automated contents which is password protected at the webpage:http://isotc.iso.org/livelink/livelink/open/jtc1sc36 but where the word AUTOMATED appears explicitly. but of course it would be nicer to have someone who knows explicitly what exactly they are having in mind there at ISO. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata, ISO and chembox
No sir, that is not right. As I said there is no ISO classification scheme of which I am aware. And I've said I no longer have interest in the wikidata team using ISO Topic Maps - that is a dead issue since the team declined to discuss it. *At the time I wrote the emails you referenced* I was interested in the SNAKS data model the team summarily announced as the basis for Wikidata implementation, in preparation as I recall for Wikimania. I felt the SNAKS data model was one that vaguely resembled the already peer reviewed and internationally standardized Topic Maps data model, while the SNAKS data model was surely not ever going to be either peer reviewed or standardized -- yet another discordant stovepipe not of remarkable benefit to the (LOD) technical community. Both SNAKS and the Topic Map data models (abstract syntaxes) can be serialized using the concrete RDF syntax (or so the W3 is said to be working on, for Topic Maps); both can/are serializable in concrete JSON syntax. So the copyright concern arose *directly from* that resemblance between the two data models. My suggestion *at the time* was to purchase the ISO standard and modify it as necessary; doing so would surely sidestep all concern about SNAKS infringing on the ISO's copyright of an abstract syntax that does NOT attempt to reference other ontologies, i.e., both in contrast to the RDF's data model which DOES integrate one or more ontologies. Essentially, both SNAKS and Topic Maps allow an author to define named-values without reference to any other ontology. As an aside, assigning responsibility to the WP community for how content is structured I think is so vague as to be uninformative; you provide no definition of structure -- is it the concept of ontology? Sure, by using Wikidata's parser functions (client server APIs) WPs will determine how information is *presented in their infoboxes, but the structure is surely going to be *only* that which SNAKS allows -- that is, named values. Whether a page's named values will be related to any ontologies, remains to be seen. For instance, will Thomas Jefferson's named-values will be exchangeable with FOAF, Dublin Core or SKOS processors? I dunno! best - john On 05.09.2012 14:13, Friedrich Röhrs wrote: Hi Luca, a) as far as I have understood Nadja Kutz and John McClure want the wikidata dev team to somehow commit to using ISO topic maps for the classification of the content of wikidata. The Dev teams position is that how the content will finally be structured is not up to them but to the community once the technical means to create the structure are there. As far as the whole arguments about using or not using ISO goes, again as far as I have understood one position is not to use them because it would force other wanting to adhere to the standard to also pay for it (pay to get the documentation about how it works) and wikimedia somehow to pay for it too, while the other position is that wikidata should use it because its an industry standard and the money that would have to be paid wasn't all that much. Furthermore the argument is, is that if its not done, some copyright could be infringed (not the ISO one but some other). There are (IMHO) a multitude of topics about this whole thing, most prominent: [Wikidata-l] [[meta:wikitopics]] updated it's a bit hard to follow in archive because it stretches multiple months (starting points in case you want to read up) http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000583.html http://lists.wikimedia.org/pipermail/wikidata-l/2012-June/000624.html http://lists.wikimedia.org/pipermail/wikidata-l/2012-June/000638.html [...] b) No one has yet identified as an attorney or copyright expert; on the contrary most everyone has said they are not. hope this helps, Friedrich On Wed, Sep 5, 2012 at 3:35 PM, Luca Martinelli martinellil...@gmail.com wrote: Dear all, sorry but I think I didn't correctly got the point of the whole thing. Probably, I was overestimating my English competence, or my free-licensing competence, or both. So, without ANY intention of being rude, or even polemical, I would like to ask: what is this discussion about, again? If I got it right, someone expressed his/her doubts about using ISO standards in classifying data on Wikidata because of [this point may be challenged, but this is what I understood] potential ISO copyright issues. Now, the points are: a) Is my guess correct? If no, what is the point this discussion is about? b) Is there anyone who could answer this doubt, whatever it is? Just trying to follow this thread, nothing more. Thank you. -- Luca Sannita Martinelli http://it.wikipedia.org/wiki/Utente:Sannita [1] ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org [2] https://lists.wikimedia.org/mailman/listinfo/wikidata-l [3] ___ Wikidata-l mailing list
Re: [Wikidata-l] Wikidata, ISO and chembox
Hello, The genesis of the legal question is the thread concerning using ISO Topic Map precepts not SNAKs. Surely you know a number of individuals on this forum feel that our challenge at that time was not thoughtfully engaged. Instead we received replies focused on costs associated with ISO standards, insisting that pirate ethics were mandated by the WMF, to swat away our basic questions. For instance, Nadja conflated our asking about ISO Topic Maps as a base design standard with incorporating ALL ISO STANDARDS EVER PUBLISHED into the wikidata database. Obviously I subsequently withdrew from engagement with Wikidata when the design team failed to seriously engage in the challenge at the time about a distinct technical orientation towards transclusion instead of (imho archaic non-wiki needlessly complex) client/server apis which, still to this day, I obviously consider a flaw of the wikidata design. That said, the legal questions are likely trivial; they were and are to me merely a prop for more important, but now dead and past, issues. Wikidata has committed to specific implementations; in the interests of community I support those while hoping that (a) SMW doesn't wither and then die (b) individual WPs can somehow participate uniquely in the semantic web (c) individual WPs widely adopt what Wikidata has wrought. I do know (and am developing) an SMW-based Topic Maps extension is feasible and practicable -- the benefits of which are too obvious to ignore to those who care. party on - john On 04.09.2012 01:42, Nadja Kutz wrote: John McClure wrote: I don't think you're hearing the question. A reply y'all gave on the issue was that any standard used by Wikidata needed to be 100% open-source -- no money required as in free. Even though what is being charged by ISO to support its business model is a PITTANCE in my humble opinion... So, the consequent question I asked then was, if you're not going to use any (ISO or national) standard then how can you assure the WP community that Wikidata is not violating someone's copyright(s)? Hello Lydia, Unfortunately I have to agree with John that you really do not seem to hear the question because that is also what I read as your reply. Or was there another reply which I missed somewhere in this hard-to-browse-and-search newsgroup? Thus please explain a bit more what you mean exactly by Unless something changed on the freedom status of the documents needed nothing changed since we discussed this last. I do not agree with John that the ISOs business model is a pittance though. That is as I linked to in this thread: http://article.gmane.org/gmane.org.wikimedia.wikidata/618 the ISO sells their items seperately and alone e.g. the basic description of Iso inch screw threads: http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=51386 costs 80 CHF so this could add up rather quickly to quite an amount of money. I thus asked (here:http://article.gmane.org/gmane.org.wikimedia.wikidata/576) wether one shouldnt ask for packages or at least for the use of the ISO classification scheme. I dont know how much copyright there is on classification schemes in general though. (I could imagine that this is a juridicial problem since big parts of a classification scheme are often trivial and unavoidable, like a hammer is a tool and it would make no sense to give up this classification just because there was eventually some crazy copyright protection...however may be lawyers do now think that a hammer could also equally well be classified as wardrobe item (given what one sees sometimes in jurisdiction I wouldnt wonder anymore)) Regarding the comment by Denny Vandrecic Because we ARE using standards like RDF or OWL (or HTML or URIs) which are W3C and IETF standards, and which in turn have a well documented policy regarding patents and copyrights, see e.g. http://www.w3.org/Consortium/Patent-Policy-20040205/ for W3C standards. I hope that answers that question. By looking at this page I can't really see why this is an answer to the questions, could you please explain this a bit more? thanks nad ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] extension:oversight
THIS EXTENSION IS OBSOLETE! It has been replaced by core functionality in the MediaWiki software (which was added in version 1.16.0). [1] https://www.mediawiki.org/wiki/Extension:Oversight On 17.08.2012 06:50, Lydia Pintscher wrote: Hey folks :) Here is your fresh serving of Wikidata news for the week before 2012-08-17. The wiki version is here if you prefer that: http://meta.wikimedia.org/wiki/Wikidata/Status_updates/2012_08_17 = Development = * Installed a lot of extensions on the demo system http://wikidata-test-client.wikimedia.de/wiki/Special:Version Let us know if any important ones are missing. * Going to old versions and undoing changes works as expected * The history and recent changes show useable comments now * The Universal Language Selector is installed on the demo and replaces the Stick to that Language extension * Updated Wikibase client code for fetching and displaying links from a shared (with the repo) database table, and optionally overriding them with interwiki links from local wikitext. * Improved internationalization messages * Added selenium tests for undo, rollback, restore, diffs, old revisions, history and a few more * Setup MAC to be part of our selenium grid testing environment * Fixed many little bugs in the UI, including cross browser issues * Improved modularity of client side caching and generalized it to work with any type of entity rather than just items. * Wrote up interfaces for snaks, statements and related stuff for the second phase of Wikidata See http://meta.wikimedia.org/wiki/Wikidata/Development/Current_sprint for what we're working on next. You can follow our commits here: https://gerrit.wikimedia.org/r/#/q/(status:open+project:mediawiki/extensions/Wikibase)+OR+(status:merged+project:mediawiki/extensions/Wikibase),n,z and view the ones awaiting review here: https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/extensions/Wikibase,n,z = Discussions/Press = * Internationalization, localization and co in preparation for deployment discussions on a rtl-language Wikipedia * Some mentions of Wikidata in relation to a re-design proposal that became pretty popular: http://www.spiegel.de/netzwelt/web/design-und-daten-wikipedia-soll-schoener-werden-a-848904.html = Events = see http://meta.wikimedia.org/wiki/Wikidata/Events * upcoming: Campus Party * upcoming: FrOSCon * We submitted a SxSW proposal. It'd be awesome if you'd vote for us. http://panelpicker.sxsw.com/vote/3710 = Other Noteworthy Stuff = * Logo is settled and all good now after some modifications: http://commons.wikimedia.org/wiki/File:Wikidata-logo-en.svg and http://commons.wikimedia.org/wiki/File:Wikidata-logo.svg We'll be making stickers and stuff next. = Open Tasks for You = * If you want to code check https://bugzilla.wikimedia.org/buglist.cgi?keywords=need-volunteerkeywords_type=allwordslist_id=134818resolution=---resolution=LATERresolution=DUPLICATEquery_format=advancedcomponent=WikidataClientcomponent=WikidataRepoproduct=MediaWiki%20extensions * Help spread the word about Wikidata in your Wikipedia if it's not being talked about there yet. * Help translate the most important pages on meta and the software on translatewiki.net Anything to add? Please share! :) Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] demo system updated
Hello, For the current demo system, how many triple store retrievals are being performed per Statement per page? Is this more or less or the same as expected under the final design? Under the suggested pure-transclusion approach, I believe the answer is zero since all retrievals are performed asynchronously* with respect to client wiki transclusion requests. Are additional triple store retrievals (or updates) occurring? Such as ones to inform a client wikipedia about the currency of Statements previously retrieved from wikidata? In a pure-transclusion approach, such info is easy to get at: clients query the [[modification date::]] of each transclusion. Can you point me to a (transaction-level) design for keeping client wikis in sync with Statement-level wikidata content? I'm also concerned about stability scalability. What happens to the performance of client wikis should the wikidata host be hobbled by DOS attacks, or inadvertent long-running queries, or command line maintenance scripts, or poorly designed wikibots or, as expected, by possilby tens of thousands of wikis accessing the central wikidata host? Under the pure-transclusion approach, my concerns are not nearly the same since all transcludable content is cached on squid servers Thanks - john ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] demo system updated
Hello Martynas, Interesting to read about ESI at http://en.wikipedia.org/wiki/Edge_Side_Includes. I recall that a query facility is intended for Phase 3, but I have no idea the kind of store. I'd think that a quad-store is appropriate to storing provenance data in mind for each triple. It'd be interesting to know whether existing quad-stores can handle the PROV namespace; I see some interesting references at the end of http://www.w3.org/TR/2012/WD-prov-aq-20120619/ to explore! Best - john On 21.06.2012 14:27, Martynas Jusevičius wrote: John, I pretty much second your concerns. Do you know Edge Side Includes (ESI)? I was thinking about using them with XSLT and Varnish to compose pages from remote XHTML fragments. Regarding scalability -- I can only see those possible cases: either Wikidata will not have any query language, or it's query language will be SQL with never ending JOINs too complicated to be useful, or it's gonna be another query language translated to SQL -- for example SPARQL, which is doable but attempts have shown it doesn't scale. A native RDF store is much more performant. Martynas graphity.org On Thu, Jun 21, 2012 at 11:34 PM, jmccl...@hypergrove.com wrote: Hello, For the current demo system, how many triple store retrievals are being performed per Statement per page? Is this more or less or the same as expected under the final design? Under the suggested pure-transclusion approach, I believe the answer is zero since all retrievals are performed asynchronously* with respect to client wiki transclusion requests. Are additional triple store retrievals (or updates) occurring? Such as ones to inform a client wikipedia about the currency of Statements previously retrieved from wikidata? In a pure-transclusion approach, such info is easy to get at: clients query the [[modification date::]] of each transclusion. Can you point me to a (transaction-level) design for keeping client wikis in sync with Statement-level wikidata content? I'm also concerned about stability scalability. What happens to the performance of client wikis should the wikidata host be hobbled by DOS attacks, or inadvertent long-running queries, or command line maintenance scripts, or poorly designed wikibots or, as expected, by possilby tens of thousands of wikis accessing the central wikidata host? Under the pure-transclusion approach, my concerns are not nearly the same since all transcludable content is cached on squid servers Thanks - john ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org [1] https://lists.wikimedia.org/mailman/listinfo/wikidata-l [2] ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l Links: -- [1] mailto:Wikidata-l@lists.wikimedia.org [2] https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] transclusion v client-server
Dear all, With regard to the whitepaper/demo Gregor asked for, if there's some real interest expressed by the wikidata team in this, that'd be good to hear. So far little/no comment has been offered about the wikitopics approach or transclusion. It's hard to get excited about investing my time and energy if it's being met with a collective yawn by the wikidata team. Seriously, I have no interest in academic exercises nor talking to walls. I sincerely think that a transclusion-based design is a strategic tactical improvement that merits debate against the client-server API the wikidata team is now creating. thanks - john ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata Transclusions
Wikidata publishing infoboxes and Wikipedias using them is again the client-server model. Not sure where this chestnut is coming from. Transclusion is as close to client-server as my cooking is to being gourmet! There's NO API. so I don't understand your commenst at all, sorry. On 13.06.2012 23:48, Nikola Smolenski wrote: On 14/06/12 00:39, jmcclure@hypergrove.comwrote: Transclusion is surely fundamental to wiki application design. The [[wikidata]] proposal by contrast is a client-server API, such things an artifact of the 20th century. What is the point of it here? Ultimately the problem you're grappling with is not just just about infoboxes, it's about *anything* other than article text that has multilingual requirements. For instance, the same *pie chart* is to be shared among wikipedias, the only difference being the graph's title, key and other labels... [[wikidata]] is today doing format=table, later other formats. That's alot to handle in an API. I don't think Wikidata will ever do other formats. Wikidata will only export pure data. So, it's highly advised the client-server API approach be scrapped. At a minimum, it's outdated technology, for good reasons. Instead, wikidata should *publish* infoboxes that are happily cached on wikidata servers. That's the best performance that can possibly be had. Wikidata publishing infoboxes and Wikipedias using them is again the client-server model. And if Wikidata publishes infoboxes, pie charts and the like, THAT will complicate the API, not the current approach. Not to mention that Wikipedias have and want to have different infobox designs. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata Transclusions
I strongly disagree with the demand to make this the only choice. - Gregor, I'm a bit confused -- are you talking about the transclusion design approach in this statement? because, if so, I'd think there'd be a number of infobox styles that can be selected by an author on the wikidata platform when 'building' the infobox page. The author can transclude any number/any specific infobox(es) on their wikipedia page, eg {{wikidata:en:Infobox:Some topic/some custom imfobox}} On 14.06.2012 03:11, Gregor Hagedorn wrote: While I agree that it is desirable to support simple, preformatted Infoboxes that can, with minimal effort be re-used in a large number of language versions of Wikipedia, I strongly disagree with the demand to make this the only choice. I think the present Wikidata approach to allow local Wikipedias to customize their infoboxes by accessing wikidata properties property-by-property is the right path. The large Wikipedias with many editors have invested considerable creative energy into making quite a large number of infoboxes elaborate information containers. That includes formatting, images and hand-crafted links in both the field name and the field value side. Some values are expressed through svg graphics, other values expressed through background color coding, etc. Limiting the usability of Wikidata to plain vanilla infox boxes could cause considerable resistance in these communities. And although small Wikipedia will profit a lot from Wikidata, without the engagement of editors from the large Wikipedias into curating Wikidata content, the increased synergies will not happen. Another issue is that (I believe that) Wikidata does not have a notion of ordering properties. Correct? This is no issue for the present Wikidata approach because infoboxes remain curated in each local Wikipedia. However, in a centralized one size fits all approach, replacing existing infoboxes where information is presented in a logical order with an alphabetical property order would create huge resistance (and would be a complex issue that Wikidata would have to deal with, allowing property ordering and filtering). I believe that Wikidata correctly aims to provide a smooth transition path, where it is possible to obtain only part of an infobox from wikidata and inject wikidata content into existing infobox layouts. That said: I would encourage a third party contributor to try to create a default Wikidata infobox generator in a way (extension installable in multiple Wikipedias) that enables a wikipedia to autocreate a good looking, plain vanilla infobox with minimal effort. Gregor ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata Transclusions
Gregor says Or are you proposing to simply use the existing template programming with the only the difference that wikidata is the only mediawiki where the properties can be accessed within templates? Much of my argument assumes that you are looking for a non-template based infobox renderer, I may be wrong there. I am proposing that, from the perspective of the Tiger article author, I would log on to wikidata and create a page called Infobox:Tiger using a Semantic Form named Taxobox whose fields track to the args for the Taxobox template. On the form, one would select whether a genus, familia, ordo, ... regnum is represented by the page... assume genus was selected, so. {{Taxobox | status = EN (DUBLIN CORE PROPERTY: LANGUAGE) | fossil_range = Early [[Pleistocene]] - Recent (DUBLIN CORE: COVERAGE) | status_system = iucn3.1 (DUBLIN CORE: FORMAT) | trend = down (DUBLIN CORE: FORMAT) | status_ref =ref name=IUCN/ (DUBLIN CORE: IDENTIFIER) | image = Tiger in Ranthambhore.jpg (EMBEDDED {{IMAGE}} TEMPLATE CALL) | image_caption = A [[Bengal tiger]] (''P. tigris tigris'') in India's [[Ranthambhore National Park]]. (PART OF {{IMAGE}} TEMPLATE CALL) | image_width = (PART OF {{IMAGE}} TEMPLATE CALL) | regnum = [[Animal]]ia (UNNECESSARY - LOOK IT UP VIA #ASK) | phylum = [[Chordate|Chordata]] (UNNECESSARY - LOOK IT UP VIA #ASK)) | classis = [[Mammal]]ia (UNNECESSARY - LOOK IT UP VIA #ASK)) | ordo = [[Carnivora]] (UNNECESSARY - LOOK IT UP VIA #ASK)) | familia = [[Felidae]] (UNNECESSARY - LOOK IT UP VIA #ASK)) | genus = ''[[Panthera]]'' (DUBLIN CORE PROPERTY: RELATION) | species = 'P. tigris' (DUBLIN CORE PROPERTY: TITLE) | binomial = ''Panthera tigris'' (EMBEDDED {{BINOMIAL}} TEMPLATE CALL) | binomial_authority = ([[Carl Linnaeus|Linnaeus]], 1758) (PART OF EMBEDDED {{BINOMIAL}} TEMPLATE CALL) | synonyms = (DUBLIN CORE PROPERTY: TITLE) center'Felis tigris' small[[Carl Linnaeus|Linnaeus]], 1758/smallref name=Linn1758 / br / 'Tigris striatus' small[[Nikolai Severtzov|Severtzov]], 1858/smallbr / 'Tigris regalis' small[[John Edward Gray|Gray]], 1867/center | range_map = Tiger_map.jpg (EMBEDDED {{IMAGE}} TEMPLATE CALL) | range_map_width = (PART OF EMBEDDED {{IMAGE}} TEMPLATE CALL) | range_map_caption = Tiger's historic range in ca. 1850 (pale yellow) and range in 2006 (in green).ref name=dinerstein07 / | subdivision_ranks = [[Subspecies]] | subdivision = (UNNECESSARY - LOOK IT UP VIA #ASK) ''[[Bengal tiger|P. t. tigris]]''br/ ''[[Indochinese tiger|P. t. corbetti]]''br / ''[[Malayan tiger|P. t. jacksoni]]''br / ''[[Sumatran tiger|P. t. sumatrae]]''br / ''[[Siberian tiger|P. t. altaica]]''br / ''[[South China tiger|P. t. amoyensis]]''br / †''[[Caspian tiger|P. t. virgata]]''br / †''[[Bali tiger|P. t. balica]]''br / †''[[Javan tiger|P. t. sondaica]]'' }} One could create [[en:Infobox:Tiger]] and [[de:Infobox:Tiger]] to handle language differences. Alternatively there'd be only [[Infobox:Tiger]] that has language-qualified string properties, e.g, Title^en contains english title for the page vs Title^de that contains the German title, with the #ask pulling the correct one given the wiki's language eg |?Title^{{CONTENTLANG}} For links, use the same magic word eg |genus={{CONTENTLANG}}:Panthera To be a bit fancier: |genus={{#ifexist:{{CONTENTLANG}}:Panthera|{{CONTENTLANG}}:Panthera|en:Panthera}} Now, the above is a traditional non-topicmap treatment without provenance. Let's add provenance first: |genus={{Dublin Core|value=Panthera|creator={{USERNAME}}|date={{TODAY}}|language=}} Now lets add the topicmap orientation |genus={{Topic|name=Panthera|creator={{USERNAME}}|date={{TODAY}}|language=}} So there's certainly alternatives to look at. The basic theme though is NOT to create a specialized factbox editor but rather is to use Semantic Forms to capture the values of template args - values that can be links, text, or template calls. And you're right, Gregor, the primary way to access wikidata's triples for purposes of regular template programming is to logon to wikidata. imho, that establishes wikidata as a black-box, with no additional mechanisms/extensions loaded onto any 'transcluding' wikipedia, which I think is the ideal posture for integrating wikidata resources into the wikipedias. re: the whitepaper. Sure I'd be happy to put together a real demo. But as a strugglin' contractor doing opensource dev since 1998, it'd be nice if. Gotta run - john On 14.06.2012 09:29, Gregor Hagedorn wrote: Gregor, I'm a bit confused -- are you talking about the transclusion design approach in this statement? Yes, in the sense that it demands to be the only access to wiki data content in a Wikipedia. because, if so, I'd think there'd be a number of infobox styles that can be selected by an author on the wikidata platform when 'building' the infobox page. The author can transclude any number/any specific infobox(es) on their wikipedia page, eg {{wikidata:en:Infobox:Some
Re: [Wikidata-l] [[meta:wikitopics]] updated
Hi Friedrich - IAANAA (I also am not an attorney)! and likely know no more than yourself about the issues. 1. http://en.wikipedia.org/wiki/Topic_Map [3]s gives links to iso/iec 13250 2. The 'community' includes developers who contractually/implicitly guarantee the provenance of the tools/apps they deploy. 3. Only the author (the Foundation) of the wikidata tool itself would need purchase the ISO license - noone else. 4. This is about the methods processes embedded _within_ the tools, nothing about content managed by the tools. 5. It is certainly possible (if not likely) that the current [[wikidata]] design steps on some patents - one doesn't know until a search is done -- a search that presumptively we can believe has already been performed by the ISO for 13250. regards - john On 13.06.2012 06:07, Friedrich Röhrs wrote: Hi, 1c. You're arguing over CHF 200 -- which extraordinarily-cheaply and fundamentally PROTECTS the MWF from copyright infringement suits? Can the SNAK architecture provide that reassurance to the MWF community? imho there are some problems with this argumentation: First i don't really know what standard you are talking about; i can't find any ISO Topic Map metamodel. The only publication i found was on Topic Maps -- Data Model which is shortened as TMDM (not TMM). Is that the one you are referring to? There seems to be a whole group of standards based around that specific one, for example ISO 18048: Topic Maps Query Language (TMQL) and ISO 19756: Topic Maps Constraint Language (TMCL) etc. (wouldn't that mean you would have to pay for them all?) Then you seem to be mixing the MWF as an entity and the MWF community as a collection of individuals. Even if the MWF somehow purchased the documentation of the standard (or standards from above) from ISO this would mean nothing to people not part of the MWF. The community is not an official part of the fundation, i.e. they are not members, so they would have no right to the content of the standard. The same seems to be true to any third party wanting to use the data from wikidata. They would need to implement the standard (or even group of standards) if they successfully wanted to use the content offered by wikidata. To be able to do this efficiently they probably would need to buy the standards themself. This seems to me to be against wikimedias policy of ...the creation of content not subject to restrictions on creation, use, and reuse.. Having to pay for a documentation to be able to understand the structure the content is held in seems to be a restriction the use of the data. Furthermore i don't really understand your copyright infringement thread. Either we are talking about structure, which, to me it seems, is not protected by copyright (didn't oracle just fail in court because of that). Or do we talk about some sort of content that can be protected by copyright? You explicitly said you where not talking about content :You're citing a policy about CONTENT, I see, though I was focusing on data models and technical interface designs... Then again i am no laywer so don't really feel competent enough in that field to give any real advice or have an opinion about what is protected by copyright and what is not. tldr: To me it seems using the ISO standard would force third parties wanting to consume wikidata content to implement that standard too and thus having to pay for information on how to implement it. This would mean a (financial) restriction on the use of content which is in conflict with wikimedias values. Friedrich On Wed, Jun 13, 2012 at 7:38 AM, Nadja Kutz na...@daytar.de wrote: Hello John, thanks for digging this out. I see in the Brochure there is not only a postbox but they have even an office where one could meat the ISO: 1, ch. de la Voie-Creuse It would be interesting to hear whats Wikimedia's opinion on this. Nadja ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org [1] https://lists.wikimedia.org/mailman/listinfo/wikidata-l [2] ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l Links: -- [1] mailto:Wikidata-l@lists.wikimedia.org [2] https://lists.wikimedia.org/mailman/listinfo/wikidata-l [3] http://en.wikipedia.org/wiki/Topic_Map ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] SMW and Wikidata
Hi James, thanks for the pointer to [1]. 1. The main use case of Wikidata (a centralised, multi-lingual site that serves as a data repository) is different from that of SMW (a data-enhanced MediaWiki) OK, SMW installations can be centralized repositories (try DSMW) and obviously can be multilingual too. So what's the difference between these use cases? The distinction of data repository vs data-enhanced is too fine for me to understand. 2. Also, the more complex structures in Wikidata could be captured in SMW using internal objects. Exactly, my, point. So build-out SMW don't trivialize SMW don't harm the SMW community. See [[meta:wikitopics]] for my first cut at capturing the more complex structures in SMW: storing Dublin Core records as subobjects, containing (pointer-to) values-of the property. Hardly rocket science! 3. The user interface of Wikibase Repository will be based on input forms, and thus quite different from SMW. From this I conclude that Wikidata forms will not populate any template (eg {{{for template|Infobox}}}) - but invocation of the same machinery is being done under the covers. From this I conclude that the entire approach of Semantic Forms is being tossed away. It is saying that noone can use wikidata to maintain structured data attached to a page via templates nor use wikidata forms to implement their own forms -- so highly specialized to what end I cannot fathom. 4. It is not defined yet what kind of query language Wikidata will support in Phase 3 (or thereafter) -- yet another tack that makes SMW irrelevant in a wikidata-world. Why not build-out SMW to more efficiently handle multilingual values? SMW installations want that too! In summary about [1], SMW installations want provenance data too, you know. We want to keep using SMW, not install a new functionally highly similar extension that can only lead to overlap confusion inefficiency. The operative presumption of [1] is that SMW cannot/will not ever support provenance data... surely the author should give more credit to creative SMW developers. _re_: performance - I am referring to duplicated functionality between wikidata smw... (I also am very thankful for recent performance work going on)... See #5 in [1] for specific examples of such duplication... and with that fact now documented, how can anyone say that the audiences diverge? - jmc On 13.06.2012 09:41, James HK wrote: Hi John, Being only a minor member of the SMW community, I'd like to respond to some of your assumptions you made about the SMW community as whole. ... cessation of SMW development Some of us where worried at beginning of the wikidata project but I think Markus tried to ease those fears a bit in his email from 01 May 2012 (see [1]). SMW subjectively seems to be encountering quality control issues lately SMW as a community relies on its members to ensure quality control and if you look at the commit/review statistics than you can see that only a handful of people have actively committed work for SMW 1.7/SMW 1.8 which means to exercise quality control the community is relying on those actively involved. ... performan ion to wikidata but I can see from a SMW perspective that several efforts are being considered to cease unnecessary overhead (see [3] [4] [5]). Of course some extensions t s basis such as Semantic Drilldown have yet to come up with an intelligent caching strategy to minimize its impact on performance. (For example in our case we have nearly 1.5M triplets and we can feel when Semantic Drilldown is doing a database select with a large filter set). ... [[wikidata]] is doomed for not creating stakeholders within the wiki-user community that includes SMW developers Some of the SMW core developers are actively involved in the wikidata project therefore this fear mi common code base as it would help to ensure quality control in future and those who know the wikidata code base may feel encouraged to commit to SMW as well. SMW and wikidata certainly have a divergent target audiences but as a communit pe to see a symbiotic relationship between SMW and wikidata without having to refute neither of both. [1] http://wikimedia.7.n6.nabble.com/SMW-and-Wikidata-Was-SMW-devel-Semantic-MediaWiki-and-Wikidata-ContentHandler-td4943107.html [3] [2] http://meta.wikimedia.org/wiki/Wikidata/Notes/SMW_and_Wikidata [4] [3] http://www.semantic-mediawiki.org/wiki/GSoC_2012#Accepted_proposal [5] [4] http://www.semantic-mediawiki.org/wiki/Roadmap#JavaScript_base_for_dynamic_result_formats [6] [5] http://www.semantic-mediawiki.org/wiki/GSoC_2012#SMW_query_management_and_smart_updates [7] Cheers, mwjames On Wed, Jun 13, 2012 at 7:03 AM, jmccl...@hypergrove.com [8] wrote: Denny said: On the other hand, you are not the only person thinking that this (Wikitopics) is a good idea (hello Gerard!), and in the long run Wikidata could be extended to such a system -- but for now I regard this to be out of scope for Wikidata and I
Re: [Wikidata-l] SMW and Wikidata
I do appreciate that Denny Jeroen and Markus cross-fertilize. But the money is flowing now towards _REWRITING SMW FROM SCRATCH_, which worries me as I am fairly sure there will be no good migration path at the inevitable time SMW support is terminated. The fact that one (is said) to be only for small wikis (which I dispute) and the other is not, is hardly a functional difference that will prevent confusion overlap inefficiency. As I've said elsewhere, saying one is capable of multi-lingual support while implying the other is not, is simply wrong -- smw is fine for multilingual support if the implemented data model has appropriate language tags. (note: I am creating a multi-lingual wiki now, based on smw, for a client). My point is that for maximum success, wikidata should strive to welcome the smw community (back) into the mw community, to have support from those who put their professional faith in smw. By your own admission that there is substantial overlap, Wikidata will consequently and permanently split the SMW community, and this sickens me. John On 13.06.2012 09:04, Lydia Pintscher wrote: On Wed, Jun 13, 2012 at 5:36 PM, jmccl...@hypergrove.com wrote: Hi Lydia, 'We' are people who committed professionally to the SMW (and Halo) approach to enterprise computing and 'we' are clients who have invested in this approach. We'll want to install wikidata-client along with smw to get at infobox data (if such is the ultimate design). We'll want to install wikidata-host to stay current with where all the investment dollars, the technical interest, etc, are flowing. Yet you assert that smw wikidata have different target groups (without defining either). Ok then let me define it more clearly. Wikidata's clear goal is to serve the Wikipedias. Use in other contexts will also be possible and encouraged but Wikipedia is the main target. It is for a project that values references for all the structured data and it is supposed to serve a multi-language audience. SMW is well established and used in professional and non-professional projects outside of Wikipedia. It usually serves smaller wikis and has quite some use in companies for their internal knowledge management. Obviously there is overlap but in the end the projects are distinct enough to co-exist just fine. Markus wrote a long email about this here: http://www.mail-archive.com/semediawiki-devel@lists.sourceforge.net/msg03369.html However, I believe they are the SAME because their objectives are the same: to integrate structured data into the MW editing/display environment. There's not room for multiple implementations of tools with the same objective. If you define the goal that broadly then yes it might be the same for both. But this is actually too broad. See above. Links: -- [1] http://www.wikimedia.de [2] mailto:Wikidata-l@lists.wikimedia.org [3] https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] SMW and Wikidata
Gerard, Sure there's plenty of people who can build a prototype application of this sort -- most people pay mortgages though so it'd be nice (if not required) to get compensated for such work, as you the wikidata team are. re the wikidata excuse: it's wise to change one's heading before one's lost. Personally I think anyone creating an smw-based extension will do so far more swiftly (and cheaply) than what the wikidata team currently has in mind, ie, to rewrite SMW (and various of its extensions) almost from scratch. I have proposed the outline of such an application at [[meta:wikitopics]]. I'm even convinced that the SMW-based application as proposed will be faster operationally. John On 13.06.2012 00:52, Gerard Meijssen wrote: Hoi, There are several points WHY a centrally administered info box makes sense. The most important one can be found in what the Wikimedia Foundation aims to achieve: Imagine a world in which every single human being can freely share in the sum of all knowledge [1]]. This is what we are about, everything that prevents this from happening are excuses When you consider the number of articles in most Wikipedias, it is easy to see that the English Wikipedia has more info boxes then many of them have articles. By providing the facility to have centrally maintained info boxes, these info boxes will be extremely light weight as both the information and the labels will be maintained in Wikidata. The result is information that is available for localisation. This localisation consists of translating the labels and possibly the information items. I blogged about this in the past ... [2] I am an advisor to the Wikidata project and as such it is my job to make these arguments. Denny is the project manager for the Wikidata project and it is his job to ensure that his team will deliver on the agreed deliverables. Having centrally maintained templates is not part of what his team has agreed to or can be expected to deliver in the short term. This is a valid excuse; it is valid for now. An excuse for the Wikidata development team does not prevent other people from developing this functionality in stead. The basic requirement is for Wikidata to be able to have translatable data items associated with a Wikipedia article. As each of those items are uniquely identified, they can be identified in a template. This template should only refer to the data items. When this functionality is developed, the basic functionality is ready to consider the use of such templates for real in Wikidata clients. I am confident that there are plenty people who have the expertise to make a functional prototype. Such a prototype can be reviewed by any MediaWiki reviewer for the usual MediaWiki criteria. When this is done, it is no longer an unreasonable burden for the Wikidata team to consider the functionality of such prototypes. Thanks, Gerard [1] www.wikimediafoundation.org [3] [2] http://ultimategerardm.blogspot.nl/2012/05/wmdevdays-wikidata.html [4] On 13 June 2012 00:03, jmccl...@hypergrove.com [5] wrote: Denny said: On the other hand, you are not the only person thinking that this (Wikitopics) is a good idea (hello Gerard!), and in the long run Wikidata could be extended to such a system -- but for now I regard this to be out of scope for Wikidata and I will not devote resources for this. _IT CAN BE ADDED LATER ANYWAY_. Denny, I never see the long-run! Anyway, to get real, be aware there are specific concerns about [[wikidata]] within the SMW community in the here and now: * we worry that our sites are threatened by virtual cessation of SMW development -- this may already be happening a bit as SMW subjectively seems to be encountering quality control issues lately * we worry that, whenever we install the [[Wikidata]] extension, then the performance of client sites will be affected by the burden of multiple forms, query and format software modules, syntaxes, styles, artifacts etc * we worry that, since no specific problems experienced by wiki-users have yet been identified that [[Wikidata]] will fix, in the end, [[wikidata]] is doomed for not creating stakeholders within the wiki-user community that includes SMW developers. jmc ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org [1] https://lists.wikimedia.org/mailman/listinfo/wikidata-l [2] Links: -- [1] mailto:Wikidata-l@lists.wikimedia.org [2] https://lists.wikimedia.org/mailman/listinfo/wikidata-l [3] http://www.wikimediafoundation.org [4] http://ultimategerardm.blogspot.nl/2012/05/wmdevdays-wikidata.html [5] mailto:jmccl...@hypergrove.com ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] Wikidata Transclusions
I base my belief that [[wikitopics]] is operationally faster on a basic difference between the two designs, as I think the wikipedias will operate faster if they merely transclude infoboxes of their choice, at their own speed, from the wikidata central repository. Transclusion is surely fundamental to wiki application design. The [[wikidata]] proposal by contrast is a client-server API, such things an artifact of the 20th century. What is the point of it here? Ultimately the problem you're grappling with is not just just about infoboxes, it's about *anything* other than article text that has multilingual requirements. For instance, the same *pie chart* is to be shared among wikipedias, the only difference being the graph's title, key and other labels... [[wikidata]] is today doing format=table, later other formats. That's alot to handle in an API. So, it's highly advised the client-server API approach be scrapped. At a minimum, it's outdated technology, for good reasons. Instead, wikidata should *publish* infoboxes that are happily cached on wikidata servers. That's the best performance that can possibly be had. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] Wikidata Transclusions
I don't understand why it's so unlikely, Lydia. ANY educational article (science, math, engineering) can have graphics whose underlying data is not language-sensitve. How about timelines on a bio article -- that's anothr example. Or a map within a place article? Or financial data within a business article? I think these are more likely than the scenario that concerns you, where the *data itself* used to construct the graphic, is language- or country-sensitive. On 13.06.2012 16:03, Lydia Pintscher wrote: Hi John, On Thu, Jun 14, 2012 at 12:39 AM, jmccl...@hypergrove.com wrote: I base my belief that [[wikitopics]] is operationally faster on a basic difference between the two designs, as I think the wikipedias will operate faster if they merely transclude infoboxes of their choice, at their own speed, from the wikidata central repository. Transclusion is surely fundamental to wiki application design. The [[wikidata]] proposal by contrast is a client-server API, such things an artifact of the 20th century. What is the point of it here? Ultimately the problem you're grappling with is not just just about infoboxes, it's about *anything* other than article text that has multilingual requirements. For instance, the same *pie chart* is to be shared among wikipedias, the only difference being the graph's title, key and other labels... [[wikidata]] is today doing format=table, later other formats. That's alot to handle in an API. Other can probably comment more on the rest of your email but here's one thing: It will very very likely not be the same pie chart. The Wikipedias have quite different demands as to what they want to show and what is important to them. So, it's highly advised the client-server API approach be scrapped. At a minimum, it's outdated technology, for good reasons. Instead, wikidata should *publish* infoboxes that are happily cached on wikidata servers. That's the best performance that can possibly be had. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org [1] https://lists.wikimedia.org/mailman/listinfo/wikidata-l [2] -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l Links: -- [1] mailto:Wikidata-l@lists.wikimedia.org [2] https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] SMW and Wikidata
sorry daniel for saying that, lydia is right on. collegially yours - john On 13.06.2012 16:15, Lydia Pintscher wrote: On Wed, Jun 13, 2012 at 7:41 PM, jmccl...@hypergrove.com wrote: Daniel - this distinction between facts vs claims -- is happy bullshit merely meant to calm the masses. A claim is a fact and a fact is a claim. It It is not happy bullshit. is the existence of PROVENANCE data in the Wikidata data model that distinguishes the two tools. Provenance data is at the heart of the web-of-trust, the top rung of the Internet architecture promulgated by the W3. So, if your view is that SMW is not for provenance data, while Wikidata is for provenance data, then how can I not conclude SMW is down-version? Why would I not toss SMW for Wikidata since they BOTH handle structured data? I am not sure what you want to hear really. You seem to have made up your mind. If you need provenance data for a specific use-case and SMW doesn't give that to you then it might indeed not be a good fit for that particular use-case. That doesn't make SMW useless in any way however ;-) http://meta.wikimedia.org/wiki/Wikidata/Notes/ContentHandler by the way. And if I remember correctly some of the SMW devs even said that this would be useful to have for SMW. It isn't something we're imposing on anyone but that can be very useful once done. It's one of the ways were SMW can benefit from groundwork done for Wikidata. More will probably come up. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher [1] Community Communications for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de [2] Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org [3] https://lists.wikimedia.org/mailman/listinfo/wikidata-l [4] Links: -- [1] http://about.me/lydia.pintscher [2] http://www.wikimedia.de [3] mailto:Wikidata-l@lists.wikimedia.org [4] https://lists.wikimedia.org/mailman/listinfo/wikidata-l ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] SMW and Wikidata
Hi Lydia, Specifically I'm keen to understand that * [[wikidata]] is committed to build-out smw for provenance data (and ContentHandler, per SRF formats); * [[wikidata]] is establishing a base grammar for everyone to share via Type, Tag, Subject and other namespaces; * [[wikidata]]'s multi-language labels rquirement is met via TopicMaps, whose focus is scoping topic-names; * [[wikidata]] is a transcludable repository of infoboxes, SRF graphics, indexes, etc. relevant to any topic. thanks - john On 13.06.2012 16:15, Lydia Pintscher wrote: On Wed, Jun 13, 2012 at 7:41 PM, jmccl...@hypergrove.com wrote: is the existence of PROVENANCE data in the Wikidata data model that distinguishes the two tools. Provenance data is at the heart of the web-of-trust, the top rung of the Internet architecture promulgated by the W3. So, if your view is that SMW is not for provenance data, while Wikidata is for provenance data, then how can I not conclude SMW is down-version? Why would I not toss SMW for Wikidata since they BOTH handle structured data? I am not sure what you want to hear really. ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] [[meta:wikitopics]] updated
1. as mentioned several times, a standard for us to be considered must be free. Free as in Everyone can get it without having to pay or register for it. I can give it to anyone legally without any restrictions. Free of patents. Free as in W3C. 2. I have taken another look at your page, and after starting to read it you simply loose me. You use so many terms without defining them. To give just a few examples: * The NIF ontology is incorporated into the ontology for Wikitopics which shapes API designs. I do not know what the Wikitopics ontology is. The section beneath just lists a few keywords, but does not really explain it. I do not know what it means for ontologies to incorporate one another. I do not know what it means for an ontology to shape API designs. * Wikipage naming conventions are used to name subobjects in an equally meaningful manner. Equally meaningful? To what? What does this even mean? You completely lost me here. * For the key wikipage transclusions, you do not explain what a formatted topic presentation is, a formatted topic index, or a formatted infobox. I think I understand the latter, but not the previous two. What are they? And if I indeed understand it right, are you saying that infoboxes have to be completely formatted in Wikidata, as Gregor has asked? Hello Denny, 1. There are likely several ways to accommodate your process requirements. And btw, I asked last month but received no response for a citation to relevant MWF policy on this issue, to detect whether your statement reflects the team's ELECTIVE policy or a MWF policy. Where's the benefit from imposing expenses magnitudes greater on everyone, to design develop socialize solutions already known? And please mention how the wikidata community can be assured that the wikidata team's designs themselves don't infringe someone else's patent or copyright, a reassurance that would directly follow from MWF's purchase of rights to use an ISO standard. 2a. Surely you appreciate that Wikidata involves fielding ''some'' ontology, at least as suggested by your intention to include the (SMW) Property namespace. I don't know when you plan to publish wikidata's ontology, but certainly it must be done so overtly and soon, agile or not. I agree the ontology I proposed needs much fleshing out, but chief goals of the proposed ontology are pretty clear -- to provide a wiki-topic index, to support NIF tools directly, to capture provenance data, to reuse existing SMW tools and key international standards, and to establish various best-practices for the wider community. 2b. An ontology that 'shapes/controls API interfaces' means that the APIs' information model must align with the information model represented by the ontology. If the ontology includes an expiration-date as a required property, for instance, then the API needs to include an expiration-date as a required parameter in some fashion. 2c. One ontology incorporating another is perhaps a clumsy way to describe the process of associating a class or property defined in one ontology, to another in a different ontology, either through a subclass/subproperty relation or a documented or implemented transform. 2d.Equally meaningful as the wiki-page naming conventions are, eg interwiki:lang:ns:pgnm is quite meaningful ... I am proposing SMW subobjects be named similarly, eg scope:lang:type:name, is the proposed structure for SMW subobject names. 2e. A 'formatted topic presentation' is the content displayed on a page for a topic. Wikidata will have a page called (Main:)Thomas Jefferson that displays a formatted topic presentation, showing information harvested from other wikis plus any information developed by the wikidata community itself. Using transclusion, anyone can embed (Main:)Thomas Jefferson into their wiki. A 'formatted topic index' (which certainly can be one part of a topic's formatted presentation) is a snippet that corresponds to the Thomas Jefferson heading in a subject index under which are many subtopics eg Jefferson, Thomas [1] [2] [3] -- Early years [4] [5] [6] -- Birth [7] [8] -- Formative influences [9] [10] -- etc 2f. Perhaps you missed my immediate reply [1] to Gregor. Yes all infoboxes (among other non/formatted artifacts) are '''transcluded''' from wikidata, without the nonsense of cross-wiki API calls for individual data-items, as I understand the wikidata team is now gearing to provide. Best regards - jmc [1] http://lists.wikimedia.org/pipermail/wikidata-l/2012-May/000588.html , for instance ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Re: [Wikidata-l] [[meta:wikitopics]] updated
Hi all! Am a bit mystified here! about the radio-silence to this thread or, for that matter, to the [[meta:wikitopics]] [1] document itself. From wikipedia: [2] REINVENTING THE SQUARE WHEEL is the practice of unnecessarily engineering artifacts that provide functionality already provided by existing standard artifacts (reinventing the wheel) and ending up with a worse result than the standard (a square wheel [3]). This is an anti-pattern [4] which occurs when the engineer is unaware or contemptuous of the standard solution or does not understand the problem or the standard solution sufficiently to avoid problems overcome by the standard. Thanks ! -jmc Links: -- [1] https://meta.wikimedia.org/wiki/Wikitopics [2] http://en.wikipedia.org/wiki/Reinvent_the_wheel#Related_phrases [3] http://en.wikipedia.org/wiki/Square_wheel [4] http://en.wikipedia.org/wiki/Anti-pattern ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[Wikidata-l] [[meta:Wikitopics]] posted
Dear all, Comments are welcomed about a page I just posted which posits the use of the topic map metamodel (TMM) in the wikidata project. I'll do my best to respond, but frankly my focus these days is on more mundane topics like simple survival. Cheers - john Belltower Wikis ___ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l