Thank you for the paper. I like the overview in this paper and the clear description of Wiktionary parsing difficulties.
In the beginning of the wikokit development I thought about Finite-state machine in order to extract data, but it was very complex for me, and Wiktionary data formatting are too various in kind or quality :) So, I selected usual procedural programming with short pieces of regular expressions. But you project proves that Finite-state machines could be used in non-trivial situations. Great! -- Andrew Krizhanovsky. On Sun, Apr 7, 2013 at 8:53 AM, Sebastian Hellmann <[email protected]> wrote: > Hi Andrew, > some statistics are in here: > http://svn.aksw.org/papers/2012/JIST_Wiktionary/public.pdf > > I executed a SPARQL query on the store to do these statistics: > http://downloads.dbpedia.org/wiktionary/stats_2013_04_06.csv > > We tried to honor ELE[1] for extraction, so most likely, if the the > Wiktionary page deviates from ELE, then results are not so good for it. > > > I assume you are familiar with SPARQL, because of your D2R mapping for > wikokit. Here is the query: > Select ?g ?p count(?p) as ?count where { Graph ?g { ?s ?p ?o } } group by > ?p ?g order by desc (?g) desc(?count) > It takes to long to run over http. If you are interested in more difficult > statistics and calculations, I can also give you better access to our > service (maybe even ssh access). > > All the best, > Sebastian > > [1] https://en.wiktionary.org/wiki/Wiktionary:Entry_layout_explained > > Am 05.04.2013 18:13, schrieb Andrew Krizhanovsky: >> >> Thank Sebastian, for quick reply. >> >>>> But these do not occur frequently. For senses these seem to be available >>>> however... >> >> Can you count - how many senses and synonyms were successfully >> extracted from English Wiktionary and Russian Wiktionary, >> i.e. how many senses and synonyms are available now in DBpedia Wiktionary? >> >> It will be interesting to compare with number of senses and synonyms >> extracted from Wiktionaries by wikokit parser, >> seehttp://code.google.com/p/wikokit/#Statistics >> >> Best regards, >> Andrew. >> >> On Fri, Apr 5, 2013 at 5:57 PM, Sebastian Hellmann >> <[email protected]> wrote: >>> >>> Hi Andrew, >>> actually the tools to solve this problem are in place: >>> http://en.wiktionary.org/wiki/house#English-abode >>> links to a sense, the highlighting is there, also if you go to Editing >>> Gadgets you can enable "Enable definition editing options." to add >>> glosses. >>> This was created by Yair_rand and it allows you to connect senses with >>> the >>> help of glosses such as "abode". >>> >>> However, this has not received any uptake by the Wiktionary community. >>> >>> The idea is to have something like (on >>> http://en.wiktionary.org/wiki/house#English-establishment) >>> # {{senseid|en|establishment}}An [[establishment]], whether actual, as a >>> pub, or virtual, as a website. Particularly restaurant, casino, or >>> financial >>> or trading company. >>> ... >>> * {{sense|establishment}} [[shop]] >>> ... >>> {{trans-top|an establishment}} >>> >>> But these do not occur frequently. For senses these seem to be available >>> however: >>> >>> http://wiktionary.dbpedia.org/resource/as_soon_as_possible-English-Adverb-1en >>> >>> Query: >>> http://wiktionary.dbpedia.org/sparql >>> select * where {Graph ?g {?s >>> <http://wiktionary.dbpedia.org/terms/hasSynonym> ?o } } limit 100 >>> >>> All the best, >>> Sebastian >>> >>> Am 05.04.2013 11:23, schrieb Andrew Krizhanovsky: >>> >>>> DBpedia Wiktionary - is very interesting project! >>>> >>>> Is it possible to get list of synonyms for the first meaning of the >>>> noun "dog" now? >>>> http://en.wiktionary.org/wiki/dog#Synonyms >>>> >>>> Best regards, >>>> Andrew Krizhanovsky. >>>> >>>> On Fri, Apr 5, 2013 at 11:05 AM, Dimitris Kontokostas >>>> <[email protected]> wrote: >>>>> >>>>> Hi Moutupsi, >>>>> >>>>> You should definitely take look at DBpedia Wiktionary ( >>>>> http://dbpedia.org/Wiktionary). >>>>> It supports everything you want and can be easily configured for other >>>>> languages. >>>>> >>>>> Best, >>>>> Dimitris >>>>> >>>>> >>>>> On Thu, Apr 4, 2013 at 4:21 AM, Moutupsi Paul >>>>> <[email protected]>wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> >>>>>> >>>>>> Greeting, >>>>>> >>>>>> >>>>>> >>>>>> I am a CS grad student from Data Science Lab Stony Brook< >>>>>> https://sites.google.com/site/datascienceslab/> and I am dropping >>>>>> this >>>>>> mail to request information about parsing multi-lingual Wiktionary >>>>>> data. >>>>>> Our lab has been using Wikipedia data for quite a while now but we are >>>>>> really interested in taking advantage of the massive Wiktionary >>>>>> content >>>>>> which we feel , after proper parsing, can become an rich muti-language >>>>>> corpus. >>>>>> >>>>>> >>>>>> >>>>>> But the big hurdle is a parsing tool. We have tried a few Wiktionary >>>>>> parsing tools >>>>>> >>>>>> >>>>>> >>>>>> 1.https://github.com/clbecker/perl-wiktionary-parser/ >>>>>> >>>>>> 2. >>>>>> https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser >>>>>> >>>>>> 3. >>>>>> >>>>>> >>>>>> https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_parser >>>>>> >>>>>> 4.http://www.ukp.tu-darmstadt.de/software/jwktl/ >>>>>> >>>>>> >>>>>> >>>>>> but none of them are available in a ready-to-use or easy-to-extend in >>>>>> multiple language mode. (I am currently trying to work with wikokit >>>>>> (parser >>>>>> 2 above) ) >>>>>> >>>>>> >>>>>> >>>>>> I request for some advice, suggestion or redirection towards best >>>>>> available Wiktionary parser. We are mainly looking to extract >>>>>> meanings, >>>>>> POS, examples, translations etc. (more can never hurt). >>>>>> >>>>>> >>>>>> >>>>>> Any help is appreciated. Kindly let know if further information is >>>>>> needed. >>>>>> >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> Moutupsi >>>>>> >>>>>> _______________________________________________ >>>>>> Wiktionary-l mailing list >>>>>> [email protected] >>>>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l >>>>>> >>>>>> -- >>>>>> Dimitris Kontokostas >>>>>> Department of Computer Science, University of Leipzig >>>>>> Research Group:http://aksw.org >>>>>> Homepage:http://aksw.org/DimitrisKontokostas >>>>>> <https://lists.wikimedia.org/mailman/listinfo/wiktionary-l> >>>>> >>>>> _______________________________________________ >>>>> Wiktionary-l mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l >>>> >>>> _______________________________________________ >>>> Wiktionary-l mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l >>>> >>> -- >>> Dipl. Inf. Sebastian Hellmann >>> >>> Department of Computer Science, University of Leipzig >>> Projects:http://nlp2rdf.org ,http://linguistics.okfn.org , >>> http://dbpedia.org/Wiktionary ,http://dbpedia.org >>> Homepage:http://bis.informatik.uni-leipzig.de/SebastianHellmann >>> Research Group:http://aksw.org > > > > -- > Dipl. Inf. Sebastian Hellmann > Department of Computer Science, University of Leipzig > Projects: http://nlp2rdf.org , http://linguistics.okfn.org , > http://dbpedia.org/Wiktionary , http://dbpedia.org > Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann > Research Group: http://aksw.org _______________________________________________ Wiktionary-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
