Re: [Intellego] Structured data model for Wiktionary

Gordon P. Hemsley Sun, 21 Dec 2014 16:53:12 -0800

On 12/21/2014 07:05 PM, Axel Hecht wrote:
> Can you provide some use cases and/or purpose? That's my stock reply
> someone asks me about a data model. Gandalf and stas can sing that song
> forwards and backwards ;-)

Well, this is probably dissimilar to the data models you're used to, in
that our primary goal is merely to encode the data rather than fulfill
any particular use case. Providing this type of structured data is more
of a way of encoding the data in the most accessible way and letting the
use cases fall out of that.

But with that being said, the Introduction on the proposal provides the
general overview of the expected uses of the data: dictionary, reverse
dictionary, thesaurus, rhyming dictionary, etc. It should be easily
turned into any number of word/definition-related documents for
intuitive consumption by humans.

Of particular note to the Intellego project, however, is its benefit for
machine translation and other fields of computational linguistics. The
various Wiktionary projects are already human consumable; what they are
not, generally, is machine consumable.

So the goal here is to create a data model that will allow the human
consumable output to be more consistent while also providing a much more
accessible avenue of use for various computational processes.

> Also, I'm wondering if some pieces in particular in the implementation
> section should/could/need to be language dependent?

I'm curious as to what you have in mind. Could you provide some examples?

Wikidata is a multilingual place to interconnect the various
language-dependent Wikimedia sites (and others), so it was on that
principle that this was based: Provide a centralized location for words
and definitions so that content is not unevenly distributed or
unnecessarily duplicated.

This idea takes it a step further, however, in that the end goal (the
way I see it) is not to maintain separately wikis for separate
languages, like Wikipedia does. Given the nature of the information
being encoded, it makes more sense to me for it to all live in a single
location and be centrally localized in place.

> Axel
> 
> On 12/22/14 12:38 AM, Gordon P. Hemsley wrote:
>> Hey all,
>>
>> One of the ideas that came out of our visit to LREC back in May was the
>> need for an Open dictionary with a machine-readable data structure.
>> Wiktionary seemed like a natural source of the data itself, so I've set
>> about investigating how to get it converted and implemented as
>> structured data.
>>
>> Many people have iterated on the idea of leveraging the Wikibase code
>> used by Wikidata to store the information, and a number of proposals
>> have been put forth over the years. I have put forward my own proposal,
>> based on reading the previous proposals and my own knowledge of
>> linguistics, and I would love to get more feedback on it:
>>
>> https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2014-10
>>
>>
>> I introduced myself to Lydia Pintscher, the product manager of Wikidata
>> and Wikimedia Deutschland, at Wikimania in August, and told her that
>> we're interested in helping out with the implementation of this idea,
>> and I've had further online interactions with her since. Unfortunately,
>> though, this project is seen as a low priority by the Wikimedia folks,
>> and it will need a grassroots effort to get off the ground any time soon.
>>
>> I'd be happy to hear from anyone who is interested in helping out. There
>> are some blockers in the Wikibase codebase that will need (PHP)
>> development before we can really move ahead on this, but I'm also
>> interested in simply hearing other people's ideas. Feel free to drop by
>> #intellego on Mozilla IRC or #wiktionary or #wikidata on FreeNode if you
>> want to talk in real-time.
>>
>> Regards,
>> Gordon
>>
>>
> 

-- 
Gordon P. Hemsley
http://gphemsley.org/
_______________________________________________
tools-l10n mailing list
[email protected]
https://lists.mozilla.org/listinfo/tools-l10n

Re: [Intellego] Structured data model for Wiktionary

Reply via email to