Re: [Catalyst] Re: decoding in core

2009-02-23 Thread Zbigniew Lukasiak
On Fri, Feb 20, 2009 at 6:57 PM, Jonathan Rockway j...@jrock.us wrote:

 Braindump follows.

snip
snip


 One last thing, if this becomes core, it will definitely break people's
 apps.  Many, many apps are blissfully unaware of characters and treat
 text as binary... and their apps kind-of appear to work.  As soon as
 they get some real characters in their app, though, they will have
 double-encoded nonsense all over the place, and will blame you for this.
 (I loaded Catalyst::Plugin::Unicode, and my app broke!  It's all your
 fault.  Yup, people mail that to me privately all the time.  For some
 reason, they think I am going to personally fix their app, despite
 having written volumes of documentation about this.  Wrong.)


Some more things to consider.

- 'use utf8' in the code generated by the helpers?

- ENCODING: UTF-8 for the TT view helper?

Maybe a global config option to choose the byte or character semantics?

But with the DB it becomes a bit more complex - because BLOB columns
probably need to use byte sematic.

-- 
Zbigniew Lukasiak
http://brudnopis.blogspot.com/
http://perlalchemy.blogspot.com/

___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core

2009-02-23 Thread Neo [GC]

Zbigniew Lukasiak schrieb:

Some more things to consider.

- 'use utf8' in the code generated by the helpers?
  
Reasonable, but only if documentet. It took weeks for us until we 
learned, that this changes _nothing_ but the behaviour of several 
perl-functions like regexp, sort aso.

- ENCODING: UTF-8 for the TT view helper?

Maybe a global config option to choose the byte or character semantics?

But with the DB it becomes a bit more complex - because BLOB columns
probably need to use byte sematic.
  
Uhm, of course, as BLOB is Binary and CLOB is Character. ;) This is even 
more complex, as the databases have different treating for this 
datatypes and some of Perls DBI-drivers are somewhat broken when it goes 
to unicode (according to our perl-saves-our-souls-guru).
UTF-8 is ok in Perl itself (not easy, not coherent, but ok); but in 
combination of many modules (and as far as I learned, Perl is all about 
reusing modules) it is _hell_. Try to read UTF-8 from HTTP-request, 
store in database, select with correct order, write to XLS, convert to 
CSV, reimport it into the DB and output it to the browser, all with 
different subs in the same controller... and you know, what I mean.
Even our most euphoric Perl-gurus don't have any clue how to handle 
UTF-8 from the beginning to the end without hour-long trialerror in 
their programs (and remember - we Germans do only have those bloody 
Umlauts - try to imagine this in China _).


Maybe the best thing for all average-and-below users would be a _really_ 
good tutorial about Catalyst+UTF-8. What to do, what not to do. How to 
read UTF-8 from HTTP-request / uploaded file / local file / database, 
how to write it to client / downloadable file / local file / database. 
What catalystish variable is UTF-8-encoded when and why. How to 
determine what encoding a given scalar has and how to 
encode/decode/whatevercode it to a bloody nice scalar with shiny UTF-8 
chars in it.

Short: -- Umlauts with Catalyst for dummies --



(sorry for sounding so emotional afaik our company burned man-weeks 
on solving minor encoding-bugs :-/ every tutorial we found was like you 
can do it so or so or another way 'round the house, so it's perfect and 
if you don't understand is, you're retard and should use 7bit-ASCII... 
while lately even a colleague sounds like this - as he is enlinghtened 
by CPAN literature like UTF-8 vs. utf8 vs. UTF8 ;)).



Greets and regards,
Tom Weber

___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core

2009-02-23 Thread Zbigniew Lukasiak
On Mon, Feb 23, 2009 at 2:58 PM, Neo [GC] n...@gothic-chat.de wrote:
 Zbigniew Lukasiak schrieb:

 Some more things to consider.

 - 'use utf8' in the code generated by the helpers?


 Reasonable, but only if documentet. It took weeks for us until we learned,
 that this changes _nothing_ but the behaviour of several perl-functions like
 regexp, sort aso.

Hmm - in my understanding it only changes literals in the code ( $var
= 'ą' ).  So I looked into the pod and it says:

Bytes in the source text that have their high-bit set will be
treated as being part of a literal
UTF-8 character.  This includes most literals such as identifier
names, string constants, and con-
stant regular expression patterns.


 - ENCODING: UTF-8 for the TT view helper?

 Maybe a global config option to choose the byte or character semantics?

 But with the DB it becomes a bit more complex - because BLOB columns
 probably need to use byte sematic.


 Uhm, of course, as BLOB is Binary and CLOB is Character. ;) This is even
 more complex, as the databases have different treating for this datatypes
 and some of Perls DBI-drivers are somewhat broken when it goes to unicode
 (according to our perl-saves-our-souls-guru).
 UTF-8 is ok in Perl itself (not easy, not coherent, but ok); but in
 combination of many modules (and as far as I learned, Perl is all about
 reusing modules) it is _hell_. Try to read UTF-8 from HTTP-request, store in
 database, select with correct order, write to XLS, convert to CSV, reimport
 it into the DB and output it to the browser, all with different subs in the
 same controller... and you know, what I mean.
 Even our most euphoric Perl-gurus don't have any clue how to handle UTF-8
 from the beginning to the end without hour-long trialerror in their
 programs (and remember - we Germans do only have those bloody Umlauts - try
 to imagine this in China _).

 Maybe the best thing for all average-and-below users would be a _really_
 good tutorial about Catalyst+UTF-8. What to do, what not to do. How to read
 UTF-8 from HTTP-request / uploaded file / local file / database, how to
 write it to client / downloadable file / local file / database. What
 catalystish variable is UTF-8-encoded when and why. How to determine what
 encoding a given scalar has and how to encode/decode/whatevercode it to a
 bloody nice scalar with shiny UTF-8 chars in it.
 Short: -- Umlauts with Catalyst for dummies --


Hmm - maybe I'll add UTF-8 handling in InstantCRUD.  I am waiting for
good sentences showing off the national characters.


-- 
Zbigniew Lukasiak
http://brudnopis.blogspot.com/
http://perlalchemy.blogspot.com/

___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core

2009-02-23 Thread Neo [GC]



Zbigniew Lukasiak schrieb:

Hmm - in my understanding it only changes literals in the code ( $var
= 'ą' ).  So I looked into the pod and it says:

Bytes in the source text that have their high-bit set will be
treated as being part of a literal
UTF-8 character.  This includes most literals such as identifier
names, string constants, and con-
stant regular expression patterns.
  

Ah SORRY! In my confusion I've confused it again...
So if I get it right, use utf8 means you can do stuff like $s ~= 
s/a/ä/; (as the plain ä in the source will be treated as one character 
and not two octets), while the magical utf8-flag for $s tells perl, that 
the ä in the scalar really is an ä and not two strange octets.

Am I right or am I completely lost again?

Hmm - maybe I'll add UTF-8 handling in InstantCRUD.  I am waiting for
good sentences showing off the national characters.
Does it have to be a complete sentence? My favourite test-string is 
something like

äöüÄÖÜß'+ (UTF-8)
C3 A4 C3 B6 C3 BC C3 84 C3 96 C3 9C C3 9F 22 27 2B (Hex)
If I can put this string into some html-form, post/get it, process it, 
save to and read from db, output it to browser _and_ still have exactly 
10 characters, the application _might_ work as it should.
The Umlauts and the Eszett are a pain of unicode, the  and ' are 
fun-with-html and escaping and the + ... well, URI-encoding, you know...


For even more fun, one should do a regex in the application using utf8 
(give me all those äÄs) and select it from the DB, first with blahfield 
LIKE 'ä', maybe upper(blahfield) LIKE upper('ä') and finally an 
ORDER BY blahfield, where blahfield should contain one row starting 
with a, one with ä and one with b and the output should have 
exactly this order and _not_ a,b,ä (hint hint: utf9 treated as ascii 
or latin1).



Greets and regards,
Tom Weber

___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core

2009-02-23 Thread Neo [GC]
Oh I forgot something... or more precisely, my boss named it while 
having a smoke. Maybe somewhat OT, but definetly interesting (maybe 
could be used to simplify the problem of double-enconding):


Does anyone know a _safe_ method to convert _any_ string-scalar to utf8?
Something like
anything_to_utf8($s)
, regardless if $s contains ascii, latin1, utf8, tasty hodgepodge or hot 
fn0rd, utf8-flag is set or not and is neither affected by full moon nor 
my horrorscope, _without_ doing double-encoding (there MUST be some way 
to determine if it already is utf8... my silly java editor can do it and 
perl makes difficult things at least possible).



I would greatly appreciate this philosophers stone and will send my hero 
a bottle of finest bavarian (munich!) beer called Edelstoff (precious 
stuff - tasty).



Greets and thanks!
Tom Weber

___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core

2009-02-23 Thread Peter Karman
Neo [GC] wrote on 02/23/2009 09:41 AM:

 Does anyone know a _safe_ method to convert _any_ string-scalar to utf8?
 Something like
 anything_to_utf8($s)
 , regardless if $s contains ascii, latin1, utf8, tasty hodgepodge or hot
 fn0rd, utf8-flag is set or not and is neither affected by full moon nor
 my horrorscope, _without_ doing double-encoding (there MUST be some way
 to determine if it already is utf8... my silly java editor can do it and
 perl makes difficult things at least possible).
 
 
 I would greatly appreciate this philosophers stone and will send my hero
 a bottle of finest bavarian (munich!) beer called Edelstoff (precious
 stuff - tasty).
 

Search::Tools::UTF8::to_utf8() comes close. It won't handle mixed
encoding in a single string (which would be garbage anyway) but it does
try to prevent double-encoding and uses the Encode goodness under the hood.

-- 
Peter Karman  .  pe...@peknet.com  .  http://peknet.com/


___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core

2009-02-23 Thread Octavian Râşniţă

From: Peter Karman pe...@peknet.com
Neo [GC] wrote on 02/23/2009 09:41 AM:



Does anyone know a _safe_ method to convert _any_ string-scalar to utf8?
Something like
anything_to_utf8($s)
, regardless if $s contains ascii, latin1, utf8, tasty hodgepodge or hot
fn0rd, utf8-flag is set or not and is neither affected by full moon nor
my horrorscope, _without_ doing double-encoding (there MUST be some way
to determine if it already is utf8... my silly java editor can do it and
perl makes difficult things at least possible).


I would greatly appreciate this philosophers stone and will send my hero
a bottle of finest bavarian (munich!) beer called Edelstoff (precious
stuff - tasty).



Search::Tools::UTF8::to_utf8() comes close. It won't handle mixed
encoding in a single string (which would be garbage anyway) but it does
try to prevent double-encoding and uses the Encode goodness under the 
hood.


--
Peter Karman  .  pe...@peknet.com  .  http://peknet.com/


I understand that there are reasons for not transforming all the encodings 
to UTF-8 in core, even though it seems to be not very complicated, because 
maybe there are some tables that contain ISO-8859-2 chars and other tables 
that contain ISO-8859-1 chars, and when the data need to be saved, it should 
keep its original encoding.


But if somebody wants to create a new Catalyst app, with a new database, new 
templates, controllers, etc, I think it could be very helpful if the 
programmer would only need to specify only once that he wants to use UTF-8 
everywhere - in the database, in the templates, in the configuration files 
of HTML::FormFu, in the controllers, and not in more places in the 
configuration file, or specify UTF8Columns in DBIC classes...

It could be a kind of default.

Octavian








___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core

2009-02-23 Thread Bill Moseley
On Mon, Feb 23, 2009 at 06:45:40PM +0200, Octavian Râşniţă wrote:
 I understand that there are reasons for not transforming all the 
 encodings to UTF-8 in core, even though it seems to be not very 
 complicated, because maybe there are some tables that contain ISO-8859-2 
 chars and other tables that contain ISO-8859-1 chars, and when the data 
 need to be saved, it should keep its original encoding.

Don't think about transforming encodings to UTF-8.

In the vast majority of cases people expect to work with characters,
and that's what Perl works with internally.  UTF-8 is an encoding, not
characters.

The HTTP request is octets.  The HTTP request specifies what encoding
those octets represent and it's that encoding that is used to decode
the octets into characters.  The fact that Perl uses UTF-8 internally
is best ignored -- it's just characters inside Perl once decoded.

Conceptually it's not that much different than a request with
Content-Encoding: gzip -- before using the request body parameters
the gzipped octets must obviously be decoded.  Likewise, the body must
be url-decoded into separate parameters.  And again, the resulting
octets must be decoded into characters if the parameters are to be
used as character.  That last step has often been ignored.

Then when sending a response of (abstract) characters that are inside
Perl they must first be encoded into octets.

Those things should be handled at the edge of the application, and
that would be in the Engine (or the code the Engine uses).

Yes, the same thing has to happen with templates, the database, and
all external data sources.  Those are separate issues.  HTTP provides
a standard way to determine how to encode and decode.


-- 
Bill Moseley
mose...@hank.org
Sent from my iMutt


___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core

2009-02-22 Thread Bill Moseley
On Fri, Feb 20, 2009 at 11:57:29AM -0600, Jonathan Rockway wrote:
 
 The problem with writing a plugin or making this core is that people
 really really want to misuse Unicode, and will whine when you try to
 force correctness upon them.

I'm not sure what you mean by wanting to misuse Unicode.  You mean
like decode using a different encoding than what the charset is in the
HTTP headers?

 The only place where you are really allowed to use non-ASCII characters
 are in the request and response.  (HTTP has a way of representing the
 character encoding of its payload -- URLs and Cookies don't.)
 
 C::P::Unicode handles this correct usage correctly.

I disagree there.  First, it assumes utf8 instead of what the
request states as the encoding.  That is generally okay (where you set
accept-encoding in your forms), but why not decode as the request
states?

Second, it only decodes the request parameters.  The body_parameters
and query_parameters are left undecoded.

Is that by design?  That is, is it expected that in a POST
$c-req-parameters-{foo} would be characters where
$c-req-body_parameters-{foo} is undecoded octets?  I would not want
or expect that.


 The problem is that
 people want Unicode to magically work where it's not allowed.  This
 includes HTTP headers (WTF!?), and URLs.  (BTW, when I say Unicode, I
 don't necessarily mean Unicode... I mean non-ASCII characters.  The
 Japanese character sets contain non-Unicode characters, and some people
 want to put these characters in their URLs or HTTP headers.  I wish I
 was making ths up, but I am not.  The Unicode process really fucked over
 the Asian languages.)

I'm not sure we want to go down that path.  Maybe a plugin for doing
crazy stuff with HTTP header encoding, but my initial email was really
just about moving decoding of the body (when we have a charset in the
request) and encoding on sending (again if there's a charset in the
response headers) into core.

Trying to do more than that is probably asking for headaches (and
whining).


I think there's reasonable debate at what point in the request
decoding should happen, though.  Frankly, I'm not sure Catalyst should
decode, rather HTTP::Body should.  HTTP::Body looks at the content
type header and if it's application/x-www-form-urlencoded it will
decode the body into separate parameters.  But, why should it ignore
decoding the charset also specified?



The query parameters are more troublesome, of course.  Seems the
common case is to use utf8 in URLs as the encoding, and in the end the
encoding just has to be assumed (or specified as a separate
parameter).  uri_for()'s current behavior of encoding to utf8 is
probably a good way to go and to just always decoded the query
parameters as utf8 in Catalyst.  I suppose uri_for() could add an
additional _enc=utf8 parameter to allow for different encodings, but
I can't imagine where just assuming utf8 would not be fine.

Of course, someone will want to mix encodings in different query
parameters.


 There are subtle issues, like knowing not to touch XML (it's binary),
 dealing with $c-res-body( filehandle ), and so on.

The layer can be set on the file handle.  XML will be decoded as
application/octet-stream by HTTP::Body, so that should be ok.
Although, if there's a chraset in the request I would still
probably argue that decoding would be the correct thing to do.

For custom processing I currently extend HTTP::Body.  For example:

$HTTP::Body::TYPES-{'text/xml'} = 'My::XML::Parser';

which does stream parsing of the XML and thus handles the XML
charset decoding.

 One last thing, if this becomes core, it will definitely break people's
 apps.  Many, many apps are blissfully unaware of characters and treat
 text as binary... and their apps kind-of appear to work.  As soon as
 they get some real characters in their app, though, they will have
 double-encoded nonsense all over the place, and will blame you for this.

That may be true for some.  For most they probably have simply ignored
encoding and don't realize they are working with octets instead of
characters, and thanks to Perl it just all works.  So working with
real characters instead will likely be transparent for them.

Catalyst::Plugin::Unicode blindly decodes using utf::decode() and I
think that's a no-op if the content has already been decoded (utf8
flag is already set).  Likewise, it only encodes if the utf8 flag is
set.  So, users of that plugin should be ok if character encoding
was handled in core and they don't remove the plugin.

-- 
Bill Moseley
mose...@hank.org
Sent from my iMutt


___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core (Was: [Announce] Catalyst-Runtime-5.8000_05)

2009-02-20 Thread Tomas Doran


On 6 Feb 2009, at 17:36, Bill Moseley wrote:


Sure.  IIRC, I think there's already been some patches and code posted
so maybe I can dig that up again off the archives.


Please do.


But, sounds like
it's not that important of an issue.


The fact that nobody is working on it currently is not an indication  
that it isn't an important problem to try to solve.


Cheers
t0m


___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core

2009-02-20 Thread Jonathan Rockway

Braindump follows.

* On Fri, Feb 20 2009, Tomas Doran wrote:
 On 6 Feb 2009, at 17:36, Bill Moseley wrote:

 Sure.  IIRC, I think there's already been some patches and code posted
 so maybe I can dig that up again off the archives.

 Please do.

 But, sounds like
 it's not that important of an issue.

 The fact that nobody is working on it currently is not an indication
 that it isn't an important problem to try to solve.

I meant to write a plugin to do this a long time ago, but I guess I
never cared enough.

The problem with writing a plugin or making this core is that people
really really want to misuse Unicode, and will whine when you try to
force correctness upon them.

The only place where you are really allowed to use non-ASCII characters
are in the request and response.  (HTTP has a way of representing the
character encoding of its payload -- URLs and Cookies don't.)

C::P::Unicode handles this correct usage correctly.  The problem is that
people want Unicode to magically work where it's not allowed.  This
includes HTTP headers (WTF!?), and URLs.  (BTW, when I say Unicode, I
don't necessarily mean Unicode... I mean non-ASCII characters.  The
Japanese character sets contain non-Unicode characters, and some people
want to put these characters in their URLs or HTTP headers.  I wish I
was making ths up, but I am not.  The Unicode process really fucked over
the Asian languages.)

So anyway, the plugin basically needs to have the following config
options, so users can specify what they want.  Inside Catalyst, only
Perl characters should be allowed, unless you mark the string as binary
(there is a CPAN module that does this, Something::BLOB).

  * Input HTTP header encoding (ASCII default)
(this is for data in $c-req-headers, cookies, etc.)
(perhaps cookies should be separately configured)

  * Input URI encoding (probably UTF-8 default)
(the dispatcher will dispatch on the decoded characters)
(source code encoding is handled by Perl, hopefully)

  * Input request body encoding (read HTTP headers and decide)

  * Output HTTP headers encoding (maybe die if this happens, because
it's totally illegal to have non-ascii in the headers)

  * Output URI encoding ($c-uri_for and friends will use this to
translate the names of actions that are named with wide characters)

  * Output response body encoding (this needs to update the HTTP
headers, namely the charset= part of Content-type)

I think that is everything.

There are subtle issues, like knowing not to touch XML (it's binary),
dealing with $c-res-body( filehandle ), and so on.

One last thing, if this becomes core, it will definitely break people's
apps.  Many, many apps are blissfully unaware of characters and treat
text as binary... and their apps kind-of appear to work.  As soon as
they get some real characters in their app, though, they will have
double-encoded nonsense all over the place, and will blame you for this.
(I loaded Catalyst::Plugin::Unicode, and my app broke!  It's all your
fault.  Yup, people mail that to me privately all the time.  For some
reason, they think I am going to personally fix their app, despite
having written volumes of documentation about this.  Wrong.)

Anyway, I just wanted to get this out of my head and onto paper, for
someone else to look at and perhaps implement. :)

Regards,
Jonathan Rockway

--
print just = another = perl = hacker = if $,=$

___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core

2009-02-20 Thread Jonathan Rockway
* On Fri, Feb 20 2009, Jonathan Rockway wrote:
 Braindump follows.

Oh yeah, one other thing.  IDNs will need to be decoded/encoded,
probably.  ($c-req-host should contain perl characters, but links
should probably be punycoded.  Fun!)

--
print just = another = perl = hacker = if $,=$

___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core (Was: [Announce] Catalyst-Runtime-5.8000_05)

2009-02-06 Thread Tomas Doran


On 6 Feb 2009, at 14:46, Bill Moseley wrote:

Nobody responded to the main point of this email -- if Catalyst should
handle encoding in core instead of with a plugin.  Nobody has an
opinion about that?  Or is was it just ignored -- which is often how
people handle character encoding in applications. ;)


Does it make a difference if its in core or in a plugin?

In your original email you said that the existing plugins don't do it  
right.. Which is quite possibly fair criticism, however I don't see  
how moving the functionality into core would help the code be more  
correct.. Saying 'Plugin X is broken', 'Lets move Plugin X into core'  
doesn't sound very convincing from where I'm sat. :_)


Code speaks louder than words, so if you'd like to provide some  
failing tests for what you think encoding _should_ be doing, that'd  
probably be a better basis for further discussion.


Cheers
t0m


___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core (Was: [Announce] Catalyst-Runtime-5.8000_05)

2009-02-06 Thread Bill Moseley
On Fri, Jan 30, 2009 at 11:44:57PM +0100, Aristotle Pagaltzis wrote:
 * Bill Moseley mose...@hank.org [2009-01-29 17:05]:
  Neither of the existing plugins do it correctly (IMO), as
  they only decode parameters leaving body_parameters as octets,
  and don't look at the request for the charset, IIRC. […]
  uri_for() rightly encodes to octets before escaping, but it
  always encodes to utf-8. Is it assumed that query parameters
  are always utf-8 or should they be decoded with the charset
  specified in the request?
 
 The URI should always be assumed to be UTF-8 encoded octets.
 The body should be decoded according to the charset declared
 in the header by the browser.

Assume UTF-8 because that's how the application encoded the
URL in the first place?  Is UTF-8 specified in an RFC?  I thought it
URIs were defined as characters with ASCII encoding for transmitting.


Nobody responded to the main point of this email -- if Catalyst should
handle encoding in core instead of with a plugin.  Nobody has an
opinion about that?  Or is was it just ignored -- which is often how
people handle character encoding in applications. ;)

-- 
Bill Moseley
mose...@hank.org
Sent from my iMutt


___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: decoding in core (Was: [Announce] Catalyst-Runtime-5.8000_05)

2009-02-06 Thread Bill Moseley
On Fri, Feb 06, 2009 at 03:16:14PM +, Tomas Doran wrote:

 On 6 Feb 2009, at 14:46, Bill Moseley wrote:
 Nobody responded to the main point of this email -- if Catalyst should
 handle encoding in core instead of with a plugin.  Nobody has an
 opinion about that?  Or is was it just ignored -- which is often how
 people handle character encoding in applications. ;)

 Does it make a difference if its in core or in a plugin?

 In your original email you said that the existing plugins don't do it  
 right.. Which is quite possibly fair criticism, however I don't see how 
 moving the functionality into core would help the code be more correct.. 
 Saying 'Plugin X is broken', 'Lets move Plugin X into core' doesn't sound 
 very convincing from where I'm sat. :_)

Two different issues, although I would assume if you moved it into
core there would be more careful consideration and discussion about
how to do it.  Which is why I posted -- for a discussion.

The question is should encoding be a core function.  A plugin works,
but not everyone uses it.  My argument for doing it in core is that
inside Perl is character data so therefore it must be decoded at
some point, and it's Catalyst (and the engines) that load the
parameters.  And if it's character data on the inside it has to be
encoded when writing.

 Code speaks louder than words, so if you'd like to provide some failing 
 tests for what you think encoding _should_ be doing, that'd probably be a 
 better basis for further discussion.

Sure.  IIRC, I think there's already been some patches and code posted
so maybe I can dig that up again off the archives.  But, sounds like
it's not that important of an issue.




-- 
Bill Moseley
mose...@hank.org
Sent from my iMutt


___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/