Lukasz Bolikowski wrote:

> I have just started my PhD research on synchronizing knowledge
> between language versions of semantically-annotated Wikipedia
> (suggesting updates and corrections in individual versions based
> on the consensus knowledge).  One of the problems is to obtain
> (or create) semantic annotation in a couple of languages, ideally
> the four most popular ones on Wikipedia: English, German, French,
> and Polish.

I had the idea to translate facts relying on interlanguage links.

If each page has these, e.g. [[Attribute:Population density]] has
[[fr:Attribut:Densité de population]] and [[Relation:Surrounded by]] has 
[[fr:Relation:Entouré par]],
then you can perform machine translation of a page's facts from the 
language it was written in to another.  All you need is the 
interlanguage links for each article in the triple, the pages in other 
languages don't have to exist.

I wasn't able to get this to work well.  I modified 
SMW_SpecialExportRDF.php to translate a given page, but it wasn't a good 
framework and I didn't have many local pages with translations.  I 
couldn't figure out how to reuse the appearance of the factbox when I 
was providing the property values from database queries instead of the 
page creating them from wikitext parsing.  I didn't fully understand how 
MediaWiki's language selection and translation worked.

If this approach worked, you could use such a machine translator as a 
translation assistant to add facts to SMW pages in other languages.  I 
would use it simply to display what other languages currently think are 
the facts about a topic, e.g. Pope John Paul II in Polish or ski racers 
in Austrian.

 > Note that I don't want to agree on a common ontology, just on
 > the technical things.  I'd like to reproduce the chaos that
 > is likely to emerge when each Wikipedia starts the semantic
 > annotation on its own.  "Aligning" knowledge from multiple
 > not-entirely-compatible ontologies is a much more interesting
 > research problem.

Some people dream of having a corpus of facts expressed in a canonical 
language, that then get translated into each language on the fly.  I 
think there was a meta-wiki commons project to do something like this 
for page translations and category names; obviously you get disagreement 
as to canonical language, what facts if any are culturally neutral, etc.

I can send my SMW_SpecialTranslateFacts page code to anyone interested 
in bad code ;-) .  After getting the triples for attributes, relations, 
and special properties for a page (much like Special:ExportRDF), it 
translates each title in the triple by querying the MediaWiki 
'langlinks' table:

     $this->db = & wfGetDB( DB_MASTER );

     $this->lang_sql = 'll_lang=' . 
$this->db->addQuotes($this->language);

     $res = $export->db->select($export->db->tableName('langlinks'),
                           'll_title',
                           'll_from =' . 
$export->db->addQuotes($this->title_id) .
                           ' AND ' . $export->lang_sql, 
            'SMWTranslateFactsTitle::GetTranslatedTitle');

     while($row = $export->db->fetchObject($res)) {
         echo "Found translated title " . $row->ll_title . "<br />\n";
         $this->title_translated = $row->ll_title;
         $this->has_translation = true;
     }
     $export->db->freeResult($res);

Regards,
--
=S

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Reply via email to