Hi

I created a patch for this problem here:
https://sourceforge.net/tracker/?func=detail&aid=2813298&group_id=93107&atid=603140
I don't know how to best post the changes in the code, so I will copy
it here as well. How should I proceed in the future if I have a change
to suggest?

I just changed \b to \< which works and the problem only arose at the
words starting with "deutsch" so there I replaced it:

--- ../pywikipedia/fixes.py     2009-06-27 19:00:52.000000000 +0200
+++ fixes.py    2009-06-27 19:46:27.000000000 +0200
@@ -293,10 +295,10 @@
         },
         'replacements': [
             (r'\batlantische(r|n|) Ozean', r'Atlantische\1 Ozean'),
-            (r'\bdeutsche(r|n|) Bundestag\b', r'Deutsche\1 Bundestag'),
-            (r'\bdeutschen Bundestags\b', r'Deutschen Bundestags'), #
Aufpassen, z. B. 'deutsche Bundestagswahl'
-            (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
-            (r'\bdeutschen Reichs\b', r'Deutschen Reichs'), #
Aufpassen, z. B. 'deutsche Reichsgrenzen'
+            (r'\<deutsche(r|n|) Bundestag\b', r'Deutsche\1 Bundestag'),
+            (r'\<deutschen Bundestags\b', r'Deutschen Bundestags'), #
Aufpassen, z. B. 'deutsche Bundestagswahl'
+            (r'\<deutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
#Aufpassen z. B. 'Großdeutsches Reich'
+            (r'\<deutschen Reichs\b', r'Deutschen Reichs'), #
Aufpassen, z. B. 'deutsche Reichsgrenzen'
             (r'\bdritte(n|) Welt(?!krieg)', r'Dritte\1 Welt'),
             (r'\bdreißigjährige(r|n|) Krieg', r'Dreißigjährige\1 Krieg'),
             (r'\beuropäische(n|) Gemeinschaft', r'Europäische\1 Gemeinschaft'),



Greetings

Hannes

2009/6/19 Francesco Cosoleto <[email protected]>:
> Hannes Röst ha scritto:
>> Hello
>>
>> I am writing for the first time and I don't quite know where the
>> appropriate place is to write this. I am working on the German
>
> Originally this mailing-list was named "pywikipediabot-users", nowadays
> it looks more as a devel mailing-list.
>
>> wikipedia and I ran into some problems using fixes.py, specifically I
>> had this edit: 
>> http://de.wikipedia.org/w/index.php?title=Deutsches_Reich_1933_bis_1945&diff=prev&oldid=61255346
>>
>> the problem is here:
>> (r'\bdeutsche(r|n|) Reich\b', r'Deutsche\1 Reich'),
>>
>> It seems to be the case that \b does not work with the German eszett,
>> whereas \< does work in my case. Should this be changed in all cases
>> where \b is used? Do you have other suggestions?
>
> I am surprised to see that. I guess that is because German eszett may be
> used in a different context. I am not sure it worth a bug report to
> Python, others software (like grep) don't work using this regexp either.
>
> A possible workaround should be this:
>
> ur'(?<!\xdf)\bdeutsche[rn] Reich\b'
>
> --
> Francesco Cosoleto
>
> "Dunque nessuno indietro
> si volti, verso le navi, dopo che ha udito l'appello,
> ma andate avanti, l'un l'altro incitatevi,
> se mai l'Olimpio Zeus, che il fulmine avventa, ci voglia concedere
> di rintuzzare l'assalto, di ricacciare i nemici in città". (Omero)
>
>
> _______________________________________________
> Pywikipedia-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>

_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Reply via email to