https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Andre Klapper changed:
What|Removed |Added
Priority|High|Normal
--- Comment #29 from Andre Klap
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #28 from Gerrit Notification Bot ---
Change 22831 abandoned by Hashar:
(bug 27849) Add normalized info for Unicode normalization of titles
Reason:
Cleaning up very old change. Feel free to resurrect if there is any interest in
fini
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Nemo changed:
What|Removed |Added
CC||federicol...@tiscali.it
See Also|
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Mark A. Hershberger changed:
What|Removed |Added
Target Milestone|1.20.0 release |Future release
--- Comment #26 f
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #25 from Roan Kattouw 2012-09-05 21:40:50
UTC ---
Moved patch into Gerrit, see https://gerrit.wikimedia.org/r/#/c/22831/ . It
doesn't actually work yet, because the unnormalized data needs to be armored to
bypass ApiResult::cleanUp
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Roan Kattouw changed:
What|Removed |Added
AssignedTo|roan.katt...@gmail.com |wikibugs-l@lists.wikimedia.
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Sam Reed (reedy) changed:
What|Removed |Added
Target Milestone|1.19.0 release |1.20.0 release
--
Configure bugmai
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Mark A. Hershberger changed:
What|Removed |Added
Blocks|29097 |
Target Milestone|---
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Aaron Schulz changed:
What|Removed |Added
CC||schulzaaro...@yahoo.de
--- Comment #24
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Sumana Harihareswara changed:
What|Removed |Added
Keywords||reviewed
CC|
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Mark A. Hershberger changed:
What|Removed |Added
Blocks|29068 |29097
AssignedTo|bawolff
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Mark A. Hershberger changed:
What|Removed |Added
AssignedTo|roan.katt...@gmail.com |bawolff...@gmail.com
--
Configu
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #22 from Bawolff 2011-06-30 03:30:48 UTC ---
(In reply to comment #21)
> (In reply to comment #20)
> > leaving this as a deployment blocker since all that seems to be needed here
> > is
> > a SMOP.
>
> This could potentially lead
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #21 from Bawolff 2011-06-30 03:29:49 UTC ---
(In reply to comment #20)
> leaving this as a deployment blocker since all that seems to be needed here is
> a SMOP.
This could potentially lead to invalid output for XML formats (since
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Mark A. Hershberger changed:
What|Removed |Added
Blocks||29068
--- Comment #20 from Mark
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #19 from Mark A. Hershberger 2011-06-29
16:42:09 UTC ---
Bryan, Bawolff,
Could one of you take this and make the necessary changes to close the bug?
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Reedy changed:
What|Removed |Added
Blocks|29068 |
--
Configure bugmail: https://bugzilla.wikim
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Mark A. Hershberger changed:
What|Removed |Added
Blocks||29068
--
Configure bugmail: htt
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #18 from Bryan Tong Minh 2011-05-06
07:30:31 UTC ---
(In reply to comment #15)
> (In reply to comment #14)
> > Can't you do something like
> > $string2 = $string
> > UtfNormal::quickIsNFCVerify( $string2 );
> > $stringIsValidUTF8 =
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #17 from Bawolff 2011-05-06 00:48:34 UTC ---
btw, if i recall we do some other normalization beyond NFC for ml and ar wikis
(that are done only on wikis with those content languages for performance
reasons, so if you get an interwik
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #16 from merl 2011-05-05 23:35:47 UTC
---
Just some statistics from my interwiki bot:
Each of my api requests normally contains 50 titles values. The title values
itself are result of other api requests, so it should all be valid u
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
p858snake changed:
What|Removed |Added
Keywords||patch
--
Configure bugmail: https://bugzi
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #15 from Brion Vibber 2011-05-05 22:35:30 UTC
---
(In reply to comment #14)
> Can't you do something like
> $string2 = $string
> UtfNormal::quickIsNFCVerify( $string2 );
> $stringIsValidUTF8 = $string === $string2 ? true : false;
>
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Bawolff changed:
What|Removed |Added
CC||bawolff...@gmail.com
--- Comment #14 from Ba
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #13 from Brion Vibber 2011-05-05 17:43:15 UTC
---
Honestly I don't think we have a good way to do that right now; UtfNormal
combines it with the NFC stuff in quickIsNFCVerify(), and our fallbacks mean
that a call to iconv() or mv_c
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #12 from Roan Kattouw 2011-05-05 17:32:41
UTC ---
(In reply to comment #11)
> So in short: don't worry about representing invalid UTF-8 byte sequences:
> either use a 'before' value that's been validated as UTF-8, or let the API
>
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #11 from Brion Vibber 2011-05-05 17:28:14 UTC
---
There are essentially two layers of work here, which our input validation
merges into a single step:
1) invalid UTF-8 sequences must be found and replaced with valid placeholder
ch
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #10 from Roan Kattouw 2011-05-05 17:21:54
UTC ---
(In reply to comment #9)
> Invalid UTF-8 is essentially random binary data and should thus be encoded,
> for
> example in base64.
Yeah. But I think it's fair not to offer this feat
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #9 from Bryan Tong Minh 2011-05-05
17:00:52 UTC ---
(In reply to comment #6)
> I could armor the from value to protect it from Unicode normalization (I've
> written code for that before; I threw it out but I should be able to repro
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Roan Kattouw changed:
What|Removed |Added
Attachment #8504|0 |1
is obsolete|
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #7 from Roan Kattouw 2011-05-05 16:09:39
UTC ---
Created attachment 8504
--> https://bugzilla.wikimedia.org/attachment.cgi?id=8504
Stashing my work-in-progress changes here, this is as good a place as any
--
Configure bugmail:
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Roan Kattouw changed:
What|Removed |Added
CC||br...@wikimedia.org,
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
merl changed:
What|Removed |Added
CC||bugrepor...@to.mabomuja.de
--- Comment #5 from
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
Mark A. Hershberger changed:
What|Removed |Added
Priority|Normal |High
CC|
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #4 from Bryan Tong Minh 2011-03-05
14:42:33 UTC ---
The normalization is done in getGPCValue. Just add a boolean parameter
$normalize.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are r
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #3 from Reedy 2011-03-05 14:40:56 UTC ---
Looks like we might need to cache it earlier...
As it looks like whenever the normalize is called, it just overrides them all..
--
Configure bugmail: https://bugzilla.wikimedia.org/userpr
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #2 from Bryan Tong Minh 2011-03-05
14:28:30 UTC ---
We can add a function to WebRequest to return the original value instead of the
normalized.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
---
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849
--- Comment #1 from Brion Vibber 2011-03-05 00:22:10 UTC ---
IIRC, this normalization is applied on raw input in WebRequest, so the API code
would only ever see the NFC form in the first place.
For it to know anything had changed, it would hav
38 matches
Mail list logo