Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-12 Thread Thomas PT
This plan sounds great! Thank you! A question about the tags used: would it be possible instead of having a "mis+Q7654321" internally and "mis" externally to use a private use subtag [1] like "mis-x-Q7654321" or "de-x-Q1980305" (or maybe "mis-x-wd-Q7654321" and "de-x-wd-Q1980305") that would

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Stas Malyshev
Hi! > For instance, we not only need identifiers for German, Swiss and > Austrian German. We also need identifiers for German German before > and after the spelling reform of 1901, and before and ofter the > spelling reform of 1996. We will also Theoretically, BCP 47 should be able to handle

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Stas Malyshev
Hi! > We will want to distinguish "a known language not on this list (mis)" from "an > unknown language (und)" and "translingual" (Wiktionary uses "mul" for > translingual, but that's not technically correct). I think "mul" is for "text in more than one language" and there's also "zxx" is for

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Gerard Meijssen
Hoi, The standard is flexible. It allows you to add user defined parts. It allows for language that have no recognised language code. The point is that the solution for external parties cannot be found in Wikidata itself. We have to use the standards if we want interoperability. We need

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Daniel Kinzler
Am 10.04.2017 um 18:12 schrieb Denny Vrandečić: > So assume we enter a new Lexeme in Examplarian (which has a Q-Item), but > Examplarian has no language code for whatever reason. What language code would > they enter in the MultilingualTextValue? My plan is: it will be "mis+Q7654321" internally,

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Daniel Kinzler
Am 10.04.2017 um 19:24 schrieb Denny Vrandečić: > Daniel, I agree, but isn't that what Multilingual Text requires? A language > code? Yes. Well, internally, it just has to be *some* unique code. But for interoperability, we want it to be a standard code. So I propose to internally use something

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Denny Vrandečić
Daniel, I agree, but isn't that what Multilingual Text requires? A language code? I.e. how does the current model plan to solve that? I assume most of it is hidden behind mini-wizards like "Create a new lexeme", which actually make sure the multitext language and the language property are

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Daniel Kinzler
Am 10.04.2017 um 18:56 schrieb Gerard Meijssen: > Hoi, > The standard for the identification of a language should suffice. I know no standard that would be sufficient for our use case. For instance, we not only need identifiers for German, Swiss and Austrian German. We also need identifiers for

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Gerard Meijssen
Hoi, The standard for the identification of a language should suffice. As long as we follow the standard and insist on the identification in this manner it is always possible to provide an identifcation. When you insist on a an item ID, that item ID needs to have a language code and this language

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Denny Vrandečić
So assume we enter a new Lexeme in Examplarian (which has a Q-Item), but Examplarian has no language code for whatever reason. What language code would they enter in the MultilingualTextValue? On Mon, Apr 10, 2017 at 8:42 AM Daniel Kinzler wrote: > Tobias' comment

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-10 Thread Daniel Kinzler
Tobias' comment made me realize that I did not clarify wone very important distinction: there are two kinds of places where a "language" is needed in the Lexeme data model : 1) the "lexeme language". This can be any Item,

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-07 Thread Info WorldUniversity
Denny, Yes, yet the timing is good for these great developments you're making with languages in Wikidata4Wiktionary. Cheers, Scott On Fri, Apr 7, 2017 at 11:50 AM, Denny Vrandečić wrote: > Scott, > > I assume you realized that the article by Norvig you cited was rather

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-07 Thread Denny Vrandečić
Scott, I assume you realized that the article by Norvig you cited was rather intentionally published on April 1st. Cheers, Denny On Fri, Apr 7, 2017 at 11:04 AM Scott MacLeod < worlduniversityandsch...@gmail.com> wrote: > I tried to see how the ISO codes and IANA language subtags compare with

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-07 Thread Scott MacLeod
I tried to see how the ISO codes and IANA language subtags compare with Glottolog's 8,444 entries under languages ( http://glottolog.org/glottolog/language) and Ethnologue's 7,099 living languages (https://www.ethnologue.com/), but couldn't find any comparisons or comparative lists. Will it be

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-07 Thread Stas Malyshev
Hi! > Something like de+Q1980305 could work; when generating HTML or RDF, we'd just > drop the suffix. For transligual entries (e.g. the for number symbol i), we > could use e.g. mis+Q1140046. I think for those that are not in particular language, und or zxx could be better. mis as I read it is

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-07 Thread David Cuenca Tudela
Personally I would prefer a mixed approach, where there is a list of top-level items that are authorized, and then verifying that the item used is a subclass of any of those items. Whether those constraints are hard-enforced or just supervised could be a topic of discussion, but IMHO the more

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-06 Thread Gerard Meijssen
Hoi, There are many valid possibilities to describe something that is not a language and language used may represent a language that does not have a language code. There is a standard for indicating languages; it allows for something like "US-American Spanish" by combining a country and a language

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-06 Thread Denny Vrandečić
On Thu, Apr 6, 2017, 16:16 Stas Malyshev wrote: > Hi! > > > - use Q-Items instead of UserLanguageCodes for Multilingual texts (which > > would be quite a migration) > > I foresee that might be a bit of a problem for external tools consuming > this data - how they would

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-06 Thread Stas Malyshev
Hi! > - use Q-Items instead of UserLanguageCodes for Multilingual texts (which > would be quite a migration) I foresee that might be a bit of a problem for external tools consuming this data - how they would figure out what language it is if it's doesn't have a code? We could of course generate

Re: [Wikidata] Languages in Wikidata4Wiktionary

2017-04-06 Thread Tobias Schönberg
An example using the second suggestion: If I would like to query all L-items that contain a combination of letters and limit those results by getting the Q-items of the language and limit those, to those that have Latin influences. In my imagination this would work better using the second