Hi Adam, Thanks for your well-intentioned letter. Do you know about Wikidata and the recent developments to support machine-readable Lexicographical data? I would like to invite you to take a look at: https://www.wikidata.org/wiki/Wikidata:Lexicographical_data
The system is still at its early stages, but you can take a look to examples like: https://www.wikidata.org/wiki/Lexeme:L11 https://www.wikidata.org/wiki/Lexeme:L403 If you have any questions about this, please do ask. Regards, Micru On Wed, May 30, 2018 at 3:01 AM, Adam Sobieski <[email protected]> wrote: > INTRODUCTION > > Machine-utilizable lexicons can enhance a great number of speech and > natural language technologies. Scientists, engineers and technologists – > linguists, computational linguists and artificial intelligence researchers > – eagerly await the advancement of machine lexicons which include rich, > structured metadata and machine-utilizable definitions. > > Wiktionary, a collaborative project to produce a free-content multilingual > dictionary, aims to describe all words of all languages using definitions > and descriptions. The Wiktionary project, brought online in 2002, includes > 139 spoken languages and American sign language [1]. > > This letter hopes to inspire exploration into and discussion regarding > machine wiktionaries, machine-utilizable crowdsourced lexicons, and > services which could exist at https://machine.wiktionary.org/ . > > LEXICON EDITIONING > > The premise of editioning is that one version of the resource can be more > or less frozen, e.g. a 2018 edition, while wiki editors collaboratively > work on a next version, e.g. a 2019 edition. Editioning can provide > stability for complex software engineering scenarios utilizing an online > resource. Some software engineering teams, however, may choose to utilize > fresh dumps or data exports of the freshest edition. > > SEMANTIC WEB > > A machine-utilizable lexicon could include a semantic model of its > contents and a SPARQL endpoint. > > MACHINE-UTILIZABLE DEFINITIONS > > Machine-utilizable definitions, available in a number of knowledge > representation formats, can be granular, detailed and nuanced. > > There exist a large number of use cases for machine-utilizable > definitions. One use case is providing natural language processing > components with the capabilities to semantically interpret natural > language, to utilize automated reasoning to disambiguate lexemes, phrases > and sentences in contexts. Some contend that the best output after a > natural language processing component processes a portion of natural > language is each possible interpretation, perhaps weighted via statistics. > In this way, (1) natural language processing components could process > ambiguous language, (2) other components, e.g. automated reasoning > components, could narrow sets of hypotheses utilizing dialogue contexts, > (3) other components, e.g. automated reasoning components, could narrow > sets of hypotheses utilizing knowledgebase content, and (4) > mixed-initiative dialogue systems could also ask users questions to narrow > sets of hypotheses. Such disambiguation and interpretation would utilize > machine-utilizable definitions of senses of lexemes. > > CONJUGATION, DECLENSION AND THE URL-BASED SPECIFICATION OF LEXEMES AND > LEXICAL PHRASES > > A grammatical category [2] is a property of items within the grammar of a > language; it has a number of possible values, sometimes called grammemes, > which are normally mutually exclusive within a given category. Verb > conjugation, for example, may be affected by the grammatical categories of: > person, number, gender, tense, aspect, mood, voice, case, possession, > definiteness, politeness, causativity, clusivity, interrogativity, > transitivity, valency, polarity, telicity, volition, mirativity, > evidentiality, animacy, associativity, pluractionality, reciprocity, > agreement, polypersonal agreement, incorporation, noun class, noun > classifiers, and verb classifiers in some languages [3]. > > By combining the grammatical categories from each and every language > together, we can precisely specify a conjugation or declension. For > example, the URL: > > https://machine.wiktionary.org/wiki/lookup.php?edition= > 2018&language=en-US&lemma=fly&category=verb&person=first- > person&number=singular&tense=past&aspect=past_simple&mood=indicative&… > > includes an edition, a language of a lemma, a lemma, a lexical category, > and conjugates (with ellipses) the verb in a language-independent manner. > > We can further specify, via URL query string, the semantic sense of a > grammatical element: > > https://machine.wiktionary.org/wiki/lookup.php?edition= > 2018&language=en-US&lemma=fly&category=verb&person=first- > person&number=singular&tense=past&aspect=past_simple&mood= > indicative&...&sense=4 > > Specifying a grammatical item fully in a URL query string, as indicated in > the previous examples, could result in a redirection to another URL. > > That is, the URL: > > https://machine.wiktionary.org/wiki/lookup.php?edition= > 2018&language=en-US&lemma=fly&category=verb&person=first- > person&number=singular&tense=past&aspect=past_simple&mood=indicative&… > > could redirect to: > > https://machine.wiktionary.org/wiki/index.php?edition=2018&id=12345678 > > or to: > > https://machine.wiktionary.org/wiki/2018/12345678/ > > and the URL with a specified semantic sense: > > https://machine.wiktionary.org/wiki/lookup.php?edition= > 2018&language=en-US&lemma=fly&category=verb&person=first- > person&number=singular&tense=past&aspect=past_simple&mood= > indicative&...&sense=4 > > could redirect to: > > https://machine.wiktionary.org/wiki/index.php?edition= > 2018&id=12345678&sense=4 > > or to: > > https://machine.wiktionary.org/wiki/2018/12345678/4/ > > The URL https://machine.wiktionary.org/wiki/2018/12345678/ is intended to > indicate a conjugation or declension with one or more meanings or senses. > The URL https://machine.wiktionary.org/wiki/2018/12345678/4/ is intended > to indicate a specific sense or definition of a conjugation or declension. > A feature from having URL’s for both conjugations or declensions and for > specific meanings or senses is that HTTP request headers can specify > languages and content types of the output desired for a particular URL. > > The provided examples intended to indicate that each complete, > language-independent conjugation or declension can have an ID number as > opposed to each headword or lemma. Instead of one ID number for all > variations of “fly”, there is one ID number for “flew”, another for “have > flown”, another for “flying”, and one for each conjugation or declension. > Reasons for indexing the conjugations and declensions instead of > traditional headwords or lemmas include that, at least for some knowledge > representation formats, the formal semantics of the definitions vary per > conjugation or declension. > > CONCLUSION > > This letter broached machine wiktionaries and some of the services which > could exist at https://machine.wiktionary.org/ . It is my hope that this > letter indicated a few of the many exciting topics with regard to > machine-utilizable crowdsourced lexicons. > > > REFERENCES > > [1] https://en.wiktionary.org/wiki/Index:All_languages#List_of_languages > [2] https://en.wikipedia.org/wiki/Grammatical_category > [3] https://en.wikipedia.org/wiki/Grammatical_conjugation > [4] https://en.wikipedia.org/wiki/List_of_HTTP_header_fields# > Request_fields > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Etiamsi omnes, ego non _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
