[Wikidata-bugs] [Maniphest] T325988: MonthNameUnlocalizer may unlocalize dates in an undesired way

2023-03-03 Thread matej_suchanek
matej_suchanek added a project: Patch-For-Review.
matej_suchanek assigned this task to Lucas_Werkmeister_WMDE.

TASK DETAIL
  https://phabricator.wikimedia.org/T325988

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, matej_suchanek
Cc: Michael, Lucas_Werkmeister_WMDE, Aklapper, matej_suchanek, Themindcoder, 
Adamm71, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, 
karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, 
ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, 
Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T325988: MonthNameUnlocalizer may unlocalize dates in an undesired way

2023-03-03 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  Proposed fix: https://github.com/wmde/Time/pull/167

TASK DETAIL
  https://phabricator.wikimedia.org/T325988

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Michael, Lucas_Werkmeister_WMDE, Aklapper, matej_suchanek, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T325988: MonthNameUnlocalizer may unlocalize dates in an undesired way

2023-02-01 Thread matej_suchanek
matej_suchanek added a comment.


  I believe it doesn't. This problem occurs in `PhpDateTimeParser`, which is 
used after the other two fixed parsers. So as long as they "catch" all these 
dates (and I hope they now do), `PhpDateTimeParser` simply won't be used and 
this bug should not happen.

TASK DETAIL
  https://phabricator.wikimedia.org/T325988

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: matej_suchanek
Cc: Lucas_Werkmeister_WMDE, Aklapper, matej_suchanek, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T325988: MonthNameUnlocalizer may unlocalize dates in an undesired way

2023-02-01 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  We should probably fix this in either case, but do you know if this still 
results in any broken parses when your change to construct parsers with a copy 
of ParserOptions 
 is 
applied? When I try it locally, that patch seems to fix all the tests I’ve 
tried so far (including `01.02.1997`, `5. 4. 1891`, `4. 5. 1891`, 
`07.05.1997`), but that might be because my local wiki is on PHP 8.1, not 7.4.
  
  If this needs a separate fix, then I think we should try to merge that fix 
(to be created) and your “construct parsers” change at the same time, so that 
they get deployed together. If your change alone is enough to fix the issue, 
then we can maybe merge that without waiting for a fix here.

TASK DETAIL
  https://phabricator.wikimedia.org/T325988

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lucas_Werkmeister_WMDE, Aklapper, matej_suchanek, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T325988: MonthNameUnlocalizer may unlocalize dates in an undesired way

2023-01-09 Thread matej_suchanek
matej_suchanek added a comment.


  "all-numeric" should already be handled in both TimeParserFactory 

 and MonthNameUnlocalizer 
,
 if I got it right.
  
  Overview of languages in which a month starts with a digit 
.

TASK DETAIL
  https://phabricator.wikimedia.org/T325988

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: matej_suchanek
Cc: Lucas_Werkmeister_WMDE, Aklapper, matej_suchanek, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T325988: MonthNameUnlocalizer may unlocalize dates in an undesired way

2023-01-06 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  Yeah, that’s true, I lost sight of that while pondering the potential greater 
horrors of `5. 4. 1891` not even parsing the same (broken) way all the time.
  
  The `MonthNameUnlocalizer` should probably skip any replacements that… only 
contain numbers and punctuation? don’t contain any letters? Not sure what the 
exact criteria should be. (Note that the “bad” replacement for e.g. January is 
`1.` in cs but `1` in ko, so “all-numeric” will fix ko but not cs.)

TASK DETAIL
  https://phabricator.wikimedia.org/T325988

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lucas_Werkmeister_WMDE, Aklapper, matej_suchanek, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T325988: MonthNameUnlocalizer may unlocalize dates in an undesired way

2023-01-06 Thread matej_suchanek
matej_suchanek added a comment.


  > But in the case of `5.` vs. `4.`, the length is the same; since PHP 8.0, 
uksort  retains the original 
order in that case, but in production (PHP 7.4, until T319432 
) the sort is not stable and may 
apparently swap the dates around arbitrarily.
  
  Let me just add that even if the sorting algorithm was stable, the problem is 
still there. If `4.` //always// came first, `5. 4. 1891` would be replaced as 
`5. April 1891` (correct), but `4. 5. 1891` would be replaced as `April 5. 
1891` (wrong). And so on.

TASK DETAIL
  https://phabricator.wikimedia.org/T325988

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: matej_suchanek
Cc: Lucas_Werkmeister_WMDE, Aklapper, matej_suchanek, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T325988: MonthNameUnlocalizer may unlocalize dates in an undesired way

2023-01-06 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  I think this only affects Czech and Korean, though, or at least I haven’t 
found purely numeric abbreviated month messages (`jan` etc. as the message key) 
in other languages.

TASK DETAIL
  https://phabricator.wikimedia.org/T325988

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lucas_Werkmeister_WMDE, Aklapper, matej_suchanek, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T325988: MonthNameUnlocalizer may unlocalize dates in an undesired way

2023-01-06 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  > The abbreviated month is always a number followed by a dot. Therefore, if 
the input is in the DD. MM.  format, the day may be replaced instead of the 
month (since during replacement string is scanned left-to-right).
  > For example, `5. 4. 1891` (April 5th, 1891) can be replaced as both `5. 
April 1891` (parsed correctly) and `May 4. 1891` (parsed with day and month 
swapped). **In general, this depends on which comes first.**
  
  And Wikibase tries to make the longer replacements first:
  
  name=MonthNameUnlocalizer:__construct()
// Order search strings from longest to shortest
uksort( $this->replacements, static function ( $a, $b ) {
return strlen( $b ) - strlen( $a );
} );
  
  But in the case of `5.` vs. `4.`, the length is the same; since PHP 8.0, 
uksort  retains the original 
order in that case, but in production (PHP 7.4, until T319432 
) the sort is not stable and may 
apparently swap the dates around arbitrarily. I tried it out in `shell.php` 
(`$sorted = ( new \Wikibase\Repo\Parsers\MediaWikiMonthNameProvider() 
)->getMonthNumbers( 'cs' ); uksort( $sorted, fn( $a, $b ) => strlen( $b ) - 
strlen( $a ) ); $sorted`), and the result seems to be at least consistent 
across calls (including across different PHP processes) – 11, 10, 12, 9, 8, 7, 
5, 4, 3, 2, 1, 6. But still, this is pretty horrible.

TASK DETAIL
  https://phabricator.wikimedia.org/T325988

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lucas_Werkmeister_WMDE, Aklapper, matej_suchanek, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T325988: MonthNameUnlocalizer may unlocalize dates in an undesired way

2022-12-28 Thread matej_suchanek
matej_suchanek created this task.
matej_suchanek added a project: Wikidata.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  MonthNameUnlocalizer 

 is used by PhpDateTimeParser 

 to unlocalize month names from user inputs by replacing them with the English 
ones text-wise. The result is then parsed using PHP's `DateTime` object.
  
  For example, `28. prosinec 2022` is replaced with `28. December 2022`, which 
PHP can understand.
  
  In production, MonthNameUnlocalizer's replacements are populated using 
`MediaWikiMonthNameProvider`, which for each month looks up its name, its 
genitive form, and its abbreviation 

 in the given language.
  
  In Czech (cs), the replacements are as follows:
  
> $provider = new \Wikibase\Repo\Parsers\MediaWikiMonthNameProvider();
> $provider->getMonthNumbers( 'cs' );
= [
"leden" => 1,
"ledna" => 1,
"1." => 1,
"únor" => 2,
"února" => 2,
"2." => 2,
"březen" => 3,
"března" => 3,
"3." => 3,
...
]
  
  The abbreviated month is always a number followed by a dot. Therefore, if the 
input is in the DD. MM.  format, the day may be replaced instead of the 
month (since during replacement string is scanned left-to-right).
  For example, `5. 4. 1891` (April 5th, 1891) can be replaced as both `5. April 
1891` (parsed correctly) and `May 4. 1891` (parsed with day and month swapped). 
In general, this depends on which comes first.
  
  In case the day is also zero-padded (e.g., `07.05.1997`), the replacement 
ignores the zeros and may create either `07. 0May 1997` or `0July 05. 1997`. 
PhpDateTimeParser then transforms them to either `07.0May.1997` or 
`0July.05.1997` and lets PHP parse them.
  The result seems to depend on PHP version. In production (PHP 7.4), the date 
is parsed as June 30th, 1997 (1997-07-00 -> 1997-06-30). On PHP 8.1, it is 
considered invalid.

TASK DETAIL
  https://phabricator.wikimedia.org/T325988

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: matej_suchanek
Cc: Aklapper, matej_suchanek, Astuthiodit_1, karapayneWMDE, Invadibot, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org