Thank you for the paper. I like the overview in this paper and the
clear description of Wiktionary parsing difficulties.

In the beginning of the wikokit development I thought about
Finite-state machine in order to extract data, but it was very complex
for me, and Wiktionary data formatting are too various in kind or
quality :) So, I selected usual procedural programming with short
pieces of regular expressions.

But you project proves that Finite-state machines could be used in
non-trivial situations. Great!

-- Andrew Krizhanovsky.

On Sun, Apr 7, 2013 at 8:53 AM, Sebastian Hellmann
<[email protected]> wrote:
> Hi Andrew,
> some statistics are in here:
> http://svn.aksw.org/papers/2012/JIST_Wiktionary/public.pdf
>
> I executed a SPARQL query on the store to do these statistics:
> http://downloads.dbpedia.org/wiktionary/stats_2013_04_06.csv
>
> We tried to honor ELE[1] for extraction, so most likely, if the the
> Wiktionary page deviates from ELE, then results are not so good for it.
>
>
> I assume you are familiar with SPARQL, because of your D2R mapping for
> wikokit. Here is the query:
> Select ?g ?p count(?p) as ?count  where { Graph ?g { ?s ?p ?o } } group by
> ?p ?g order by desc (?g) desc(?count)
> It takes to long to run over http. If you are interested in more difficult
> statistics and calculations, I can also give you better access to our
> service (maybe even ssh access).
>
> All the best,
> Sebastian
>
> [1] https://en.wiktionary.org/wiki/Wiktionary:Entry_layout_explained
>
> Am 05.04.2013 18:13, schrieb Andrew Krizhanovsky:
>>
>> Thank Sebastian, for quick reply.
>>
>>>> But these do not occur frequently. For senses these seem to be available
>>>> however...
>>
>> Can you count - how many senses and synonyms were successfully
>> extracted from English Wiktionary and Russian Wiktionary,
>> i.e. how many senses and synonyms are available now in DBpedia Wiktionary?
>>
>> It will be interesting to compare with number of senses and synonyms
>> extracted from Wiktionaries by wikokit parser,
>> seehttp://code.google.com/p/wikokit/#Statistics
>>
>> Best regards,
>> Andrew.
>>
>> On Fri, Apr 5, 2013 at 5:57 PM, Sebastian Hellmann
>> <[email protected]>  wrote:
>>>
>>> Hi Andrew,
>>> actually the tools to solve this problem are in place:
>>> http://en.wiktionary.org/wiki/house#English-abode
>>> links to a sense, the highlighting is there, also if you go to Editing
>>> Gadgets you can enable  "Enable definition editing options." to add
>>> glosses.
>>> This was created by Yair_rand and it allows you to connect senses with
>>> the
>>> help of glosses such as "abode".
>>>
>>> However, this has not received any uptake by the Wiktionary community.
>>>
>>> The idea is to  have something like (on
>>> http://en.wiktionary.org/wiki/house#English-establishment)
>>> # {{senseid|en|establishment}}An [[establishment]], whether actual, as a
>>> pub, or virtual, as a website. Particularly restaurant, casino, or
>>> financial
>>> or trading company.
>>> ...
>>> *  {{sense|establishment}} [[shop]]
>>> ...
>>> {{trans-top|an establishment}}
>>>
>>> But these do not occur frequently. For senses these seem to be available
>>> however:
>>>
>>> http://wiktionary.dbpedia.org/resource/as_soon_as_possible-English-Adverb-1en
>>>
>>> Query:
>>> http://wiktionary.dbpedia.org/sparql
>>> select * where {Graph ?g {?s
>>> <http://wiktionary.dbpedia.org/terms/hasSynonym>  ?o } } limit 100
>>>
>>> All the best,
>>> Sebastian
>>>
>>> Am 05.04.2013 11:23, schrieb Andrew Krizhanovsky:
>>>
>>>> DBpedia Wiktionary - is very interesting project!
>>>>
>>>> Is it possible to get list of synonyms for the first meaning of the
>>>> noun "dog" now?
>>>> http://en.wiktionary.org/wiki/dog#Synonyms
>>>>
>>>> Best regards,
>>>> Andrew Krizhanovsky.
>>>>
>>>> On Fri, Apr 5, 2013 at 11:05 AM, Dimitris Kontokostas
>>>> <[email protected]>  wrote:
>>>>>
>>>>> Hi Moutupsi,
>>>>>
>>>>> You should definitely take look at DBpedia Wiktionary (
>>>>> http://dbpedia.org/Wiktionary).
>>>>> It supports everything you want and can be easily configured for other
>>>>> languages.
>>>>>
>>>>> Best,
>>>>> Dimitris
>>>>>
>>>>>
>>>>> On Thu, Apr 4, 2013 at 4:21 AM, Moutupsi Paul
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Greeting,
>>>>>>
>>>>>>
>>>>>>
>>>>>> I am a CS grad student from Data Science Lab Stony Brook<
>>>>>> https://sites.google.com/site/datascienceslab/>  and I am dropping
>>>>>> this
>>>>>> mail to request information about parsing multi-lingual Wiktionary
>>>>>> data.
>>>>>> Our lab has been using Wikipedia data for quite a while now but we are
>>>>>> really interested in taking advantage of the massive Wiktionary
>>>>>> content
>>>>>> which we feel , after proper parsing, can become an rich muti-language
>>>>>> corpus.
>>>>>>
>>>>>>
>>>>>>
>>>>>> But the big hurdle is a parsing tool. We have tried a few Wiktionary
>>>>>> parsing tools
>>>>>>
>>>>>>
>>>>>>
>>>>>> 1.https://github.com/clbecker/perl-wiktionary-parser/
>>>>>>
>>>>>> 2.
>>>>>> https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser
>>>>>>
>>>>>> 3.
>>>>>>
>>>>>>
>>>>>> https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_parser
>>>>>>
>>>>>> 4.http://www.ukp.tu-darmstadt.de/software/jwktl/
>>>>>>
>>>>>>
>>>>>>
>>>>>> but none of them are available in a ready-to-use or easy-to-extend in
>>>>>> multiple language mode. (I am currently trying to work with wikokit
>>>>>> (parser
>>>>>> 2 above)  )
>>>>>>
>>>>>>
>>>>>>
>>>>>> I request for some advice, suggestion or redirection towards best
>>>>>> available Wiktionary parser. We are mainly looking to extract
>>>>>> meanings,
>>>>>> POS, examples, translations etc. (more can never hurt).
>>>>>>
>>>>>>
>>>>>>
>>>>>> Any help is appreciated. Kindly let know if further information is
>>>>>> needed.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Moutupsi
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wiktionary-l mailing list
>>>>>> [email protected]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>>>>>
>>>>>> --
>>>>>> Dimitris Kontokostas
>>>>>> Department of Computer Science, University of Leipzig
>>>>>> Research Group:http://aksw.org
>>>>>> Homepage:http://aksw.org/DimitrisKontokostas
>>>>>> <https://lists.wikimedia.org/mailman/listinfo/wiktionary-l>
>>>>>
>>>>> _______________________________________________
>>>>> Wiktionary-l mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>>>
>>>> _______________________________________________
>>>> Wiktionary-l mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>>>
>>> --
>>> Dipl. Inf. Sebastian Hellmann
>>>
>>> Department of Computer Science, University of Leipzig
>>> Projects:http://nlp2rdf.org  ,http://linguistics.okfn.org  ,
>>> http://dbpedia.org/Wiktionary  ,http://dbpedia.org
>>> Homepage:http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>> Research Group:http://aksw.org
>
>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
> http://dbpedia.org/Wiktionary , http://dbpedia.org
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org

_______________________________________________
Wiktionary-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Reply via email to