Am 27.11.11 23:47, schrieb Karl Pflästerer: > Am 27.11.11 23:23, schrieb Yannick Torrès: >> 2011/11/27 Karl Pflästerer<k...@rl.pflaesterer.de>: >>> Hi, >> >> Hi, >> >>> forgive me if I ask something which had already been discussed, but I've >>> seen nothing in the archives. >>> >>> I try to help translating some of the docs and saw here >>> https://edit.php.net/ this box: >>> >>> Check for errors in /language-snippets.ent >>> >>> The content for that box seems to get computed from tha class >>> http://svn.php.net/repository/web/doc-editor/trunk/php/ToolsError.php >>> >>> There is a method attributLinkTag() >>> >>> To compare the linkend atrribute of the<link> tags it uses a regex. >>> >>> $reg = '/<link\s*?linkend=("|\')(.*?)("|\')\s*?>/s'; >>> >>> You see between<link and the linkend attribute only whitespace is allowed. >>> But for example in the german translation (and also in the english >>> documentation) some<link> tags have another attribute between the >>> element >>> name and "linkend". >> >> Could you give me an example please of this case ? > >> From en/language-snippets.ent > > <!ENTITY seealso.array.sorting 'The<link > xmlns="http://docbook.org/ns/docbook" linkend="array.sorting">comparison of > array sorting functions</link>'> > > <!ENTITY seealso.callback 'information about the<link > xmlns="http://docbook.org/ns/docbook" > linkend="language.types.callback">callback</link> type'> > > In the german translation are more examples (some of them IMHO wrong, since > they duplicate the xmlns attribute), but I'm not sure if such a simple > difference should trigger such an error. > >> >>> An easy fix would be >>> $reg = '/<link[^<>]+linkend=("|\')(.*?)("|\')[^<>]*>/s'; >>> >>> But that would solve only have of the problem; the other problem is that >>> the >>> check script needs the same order of entities in both files and it >>> compares >>> only the position of the found links in both match arrays. So e.g. one >>> link >>> more in the translation will give false matches for all following entries. >> >> Yes it is. >> The goal here is to check each file and warn when there is only one >> difference even if this is an ordre problem (this can be a translation >> error too). > > Ok. (for a file with only entity definition order shouldn't matter or?) > >> >>> Does it make sense to rewrite that algorithm, so that it compares each >>> entity in the english original and the translation so we get better >>> errors? >> >> You mean to avoid order check ? >> Perhaps we can do this yes : check the number of this tag, and check >> if there is all of this tag, even if the order is not respected. > > I thought to perhaps check each entity definition; so not to do a simple > preg_match_all and compare $match_en[1] to $match_lang[1] but to compare the > linkend attribute of entity definition in en and $lang. > > Then the error could be: Difference in linkend attribute in entity xyz.
To be a little bit more concrete, here is a code example (that's just a POC): <?php function extract_linkend ($s) { $rx_linkend = ' / <(?: link | xref) [^<>]+ linkend=(?:"|\') (.*?) (?:"|\') [^<>]* > /xs'; $rx_entities = '/(<!ENTITY\s+(\S+).+?)(?=(?:<!ENTITY|$))/s'; preg_match_all($rx_entities, $s, $m_entities, PREG_SET_ORDER); $linkend_by_entity = array(); foreach ($m_entities as $entity) { preg_match_all($rx_linkend, $entity[1], $m_linkend); if ($m_linkend[1]) $linkend_by_entity[$entity[2]] = $m_linkend[1]; }; return $linkend_by_entity; } $link_de = extract_linkend(file_get_contents('language-snippets.ent')); $link_en = extract_linkend(file_get_contents('../en/language-snippets.ent')); $diff = array_udiff_assoc($link_en, $link_de, function ($en, $lang) { return array_diff($en, $lang) ? 1 : 0; } ); foreach ($diff as $entity => $linkends) { echo "Entity: $entity\n"; echo 'EN: ' . join('; ', $linkends), "\n"; echo 'DE: ' . join('; ', $link_de[$entity]), "\n\n"; } If I run that (with the de translation), I get: Entity: ini.php.constants EN: configuration.changes.modes DE: ini Entity: mysqli.available.mysqlnd EN: book.mysqlnd DE: mysqli.overview.mysqlnd That could be helpful (IMHO). KP