Re: [Zope-dev] RFV: Unicode in Zope 2

2005-12-19 Thread Jürgen Herrmann

On Mon, December 19, 2005 17:44, Andreas Jung wrote:
>
>
> --On 9. Dezember 2005 10:11:42 -0500 Jim Fulton <[EMAIL PROTECTED]> wrote:
>
>> > I forgot a very important need:
>>
>> - Common approach to Unicode
>>
>>
>> We need to migrate Zope 2 to use a similar strategy.  We need
volunteers to brainstorm how this can be done and make one or more
proposals. This is likely a prerequisite for finishing the publisher
and ZPT work.
>>
>
> I think there are two approaches. Textual content can be produced by nearly
> every Zope object (and its methods). Content can be composed by the
basic Zope functionalities like DTML, ZPT, PyScripts and external
methods (in the
> sense that these functionalities are able to call other objects and
their methods).
>
> a) one could enforce all objects to return unicode (which would be a
very hard  requirement) and possibly break any application
hi all!
i think, a) is surely the cleaner solution.

actually i always...
- inlcude an content-type header and set the charset for all forms to utf8
- use :utf8:ustring/utext/ulines/utokens converters on all forms
this way i always have unicode strings in my db. up to now i never had any
unicode problems, expect when interfacing external systems and not doing
the proper unicode conversion dance there...

why not design a migration script that converts all non-ascii strings in
the db to unicode strings, based on the default encoding, etc...

also just some thoughts :)

jürgen herrmann

ps: what's especially critical here is code that handles filenames. some
filesystems just don't handle unicode filenames, already had
some headache there :)
>
> b) convert non-unicode content produced by Zope objects from where they are
> called (DTMl, ZPT, PyScript, Extmethods) to unicode. This would limit
the number of places where we need to change code. The encoding of the
non-unicode content could be from the 'content-type: XXX; charset='
header
> (if set) or as fallback from the configured zpublisher_default_encoding. An
> object could also set a property "my_output_encoding" (or so)...
>
> just-some-thoughts...
> -aj
>
> ___
> Zope-Dev maillist  -  Zope-Dev@zope.org
> http://mail.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists -
>  http://mail.zope.org/mailman/listinfo/zope-announce
>  http://mail.zope.org/mailman/listinfo/zope )
>


___

>> XLhost.de - eXperts in Linux hosting <<

Jürgen Herrmann
Bruderwöhrdstraße 15b, DE-93051 Regensburg

Fon:  +49 (0)700 XLHOSTDE [0700 95467833]
Fax:  +49 (0)721 151 463027
WEB:  http://www.XLhost.de



___

>> XLhost.de - eXperts in Linux hosting <<

Jürgen Herrmann
Bruderwöhrdstraße 15b, DE-93051 Regensburg

Fon:  +49 (0)700 XLHOSTDE [0700 95467833]
Fax:  +49 (0)721 151 463027
WEB:  http://www.XLhost.de

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] RFV: Unicode in Zope 2

2005-12-19 Thread Andreas Jung



--On 9. Dezember 2005 10:11:42 -0500 Jim Fulton <[EMAIL PROTECTED]> wrote:


> I forgot a very important need:

- Common approach to Unicode


We need to migrate Zope 2 to use a similar strategy.  We need volunteers
to brainstorm how this can be done and make one or more proposals.
This is likely a prerequisite for finishing the publisher and ZPT
work.



I think there are two approaches. Textual content can be produced by nearly 
every Zope object (and its methods). Content can be composed by the basic
Zope functionalities like DTML, ZPT, PyScripts and external methods (in the 
sense that these functionalities are able to call other objects and their 
methods).


a) one could enforce all objects to return unicode (which would be a very 
hard  requirement) and possibly break any application


b) convert non-unicode content produced by Zope objects from where they are 
called (DTMl, ZPT, PyScript, Extmethods) to unicode. This would limit the 
number of places where we need to change code. The encoding of the 
non-unicode content could be from the 'content-type: XXX; charset=' 
header
(if set) or as fallback from the configured zpublisher_default_encoding. An 
object could also set a property "my_output_encoding" (or so)...


just-some-thoughts...,
-aj



pgp3FXhGpN9na.pgp
Description: PGP signature
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] RFV: Unicode in Zope 2

2005-12-15 Thread Andreas Jung



--On 15. Dezember 2005 17:55:16 +0100 Martijn Faassen <[EMAIL PROTECTED]> 
wrote:

Hm, so it auto-decodes non-unicode strings using UTF-8? That's a bit
dangerous, as you suppress a large class of unicode errors in the code.
Code that creates UTF-8 strings will be silently accepted. Not as bad as
what PTS does, at least the output of the ZPT will be unicode, but scary
nonetheless.

Do you have any impression of how compatible is your code is with
existing large Zope 2 codebases by the way?




I would like to keep the code as strict as possible for now. When the final 
implementation is ready we must test it with Plone and CPS and see what 
problems will occur and how we can deal with them in a sane way (hopefully 
saner than the current implementation). Otherwise we could also stick with 
the current implementation :-) So there is currently no need to hurry..I 
hope to finish the work in January...enough time left for testing.


-aj

pgpsqibkdZvji.pgp
Description: PGP signature
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] RFV: Unicode in Zope 2

2005-12-15 Thread Martijn Faassen

Andreas Jung wrote:


--On 9. Dezember 2005 10:11:42 -0500 Jim Fulton <[EMAIL PROTECTED]> wrote:



We need to migrate Zope 2 to use a similar strategy.  We need volunteers
to brainstorm how this can be done and make one or more proposals.
This is likely a prerequisite for finishing the publisher and ZPT
work.




My ZPT integration of the Z3 templates will definitely only allow 
unicode. Non-unicode content must pass either an encoding or accept that 
it will be converted using utf8 as default encoding to unicode.


Hm, so it auto-decodes non-unicode strings using UTF-8? That's a bit 
dangerous, as you suppress a large class of unicode errors in the code. 
Code that creates UTF-8 strings will be silently accepted. Not as bad as 
what PTS does, at least the output of the ZPT will be unicode, but scary 
nonetheless.


Do you have any impression of how compatible is your code is with 
existing large Zope 2 codebases by the way?


Regards,

Martijn
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] RFV: Unicode in Zope 2

2005-12-15 Thread TAHARA Yusei
Hello.

At Fri, 09 Dec 2005 10:11:42 -0500,
Jim Fulton wrote:
> - Common approach to Unicode
> 
> In particular, In Zope 3, all text is stored and managed as Unicode.
> The publisher decodes request data and encodes response data.  The vast
> majority of application and library code can ignore encoding issues.
> (The exceptions are applications and frameworks that need to exhange
> text with non-Unicode-aware external systems.)  This has provided
> great simplifications and allowed us to avoid common pitfals from
> mixing Unicode and encoded text.
> 
> We need to migrate Zope 2 to use a similar strategy.  We need volunteers
> to brainstorm how this can be done and make one or more proposals.
> This is likely a prerequisite for finishing the publisher and ZPT
> work.

I want to tackle this problem, because I use japanese and I have
experienced the encode/decode error pitfalls in a last few years.
I'm wrong, I didn't touch zope development and didn't read the list
carefully.

In zope2.8, there is a zpt problem and this is a same pitfall.
(unicode and string are mixed in TAL)

--
TAHARA Yusei
[EMAIL PROTECTED]
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] RFV: Unicode in Zope 2

2005-12-14 Thread Andreas Jung



--On 9. Dezember 2005 10:11:42 -0500 Jim Fulton <[EMAIL PROTECTED]> wrote:



We need to migrate Zope 2 to use a similar strategy.  We need volunteers
to brainstorm how this can be done and make one or more proposals.
This is likely a prerequisite for finishing the publisher and ZPT
work.




My ZPT integration of the Z3 templates will definitely only allow unicode. 
Non-unicode content must pass either an encoding or accept that it will be 
converted using utf8 as default encoding to unicode.


-aj


pgpdONODNmjPw.pgp
Description: PGP signature
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] RFV: Unicode in Zope 2

2005-12-13 Thread Martijn Faassen

Jim Fulton wrote:

Martijn Faassen wrote:

[snip]
I'll volunteer to help brainstorm on this, but right now my brainstorm 
is only very dark and full of lightning.



You and I brainstormed this a few months ago.  I think this was on the
list.  I think that, for starters, we would arrange that all Zope 3
views used in Zope 2 would get unicode input.  If you like, I can try
to find this discussion. :)



Ah, right, that is far less scary, indeed. Your post somehow gave me the 
impression you wanted to change the way current Zope 2 does things, but 
if you limit yourself to what happens with Zope 3 stuff in Zope 2, it's 
less scary.


In fact Five already has hacks to make sure that unicode enters 
Five-generated forms. Replacing these hacks with something solid would 
be good.


Anyway, in some basics, Zope 2 does have an approach to unicode for 
*output* that's fairly similar to Zope 3's: if you use unicode strings 
your entire output (including page templates) will be unicode (if you 
don't mix with non-unicode non-ascii strings..). Then the response 
encoding setting is read and everything is transformed once to unicode 
text. Silva uses this. It also struggles to make sure all its input is 
transformed to unicode (among other ways using Formulator).


In Plone, the situation is quite different -- its 
PlacelessTranslationService monkeypatches into the page template 
engine and puts in ways so that you can mix UTF-8 and unicode strings 
together. This then goes on to break assumptions of code that uses the 
page template engine in a unicode-pure environment (which is what 
happened to Silva).


Ick.

I'm not suggesting this is easy.  We may have some messy deprecation
and backward compatibility code.  But we *do* need to solve this problem
eventually, and the solution doesn't get any closer without taking steps.


Yes. I'm optimistic about being able to do this for Five-related stuff. 
If this is eventually going to be people's main development system, then 
we can basically say we've solved the important unicode issues.


What I'm worried about doing this for old code, but some steps will 
probably become clear during the brainstorming session. Migration tools 
that turn strings in the ZODB into unicode ones magically (with the 
ability to spell out exceptions and encoding)? Tricky...


Regards,

Martijn

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] RFV: Unicode in Zope 2

2005-12-13 Thread Jim Fulton

Martijn Faassen wrote:

Jim Fulton wrote:


I forgot a very important need:

- Common approach to Unicode

In particular, In Zope 3, all text is stored and managed as Unicode.
The publisher decodes request data and encodes response data.  The vast
majority of application and library code can ignore encoding issues.
(The exceptions are applications and frameworks that need to exhange
text with non-Unicode-aware external systems.)  This has provided
great simplifications and allowed us to avoid common pitfals from
mixing Unicode and encoded text.

We need to migrate Zope 2 to use a similar strategy.  We need volunteers
to brainstorm how this can be done and make one or more proposals.
This is likely a prerequisite for finishing the publisher and ZPT
work.



This is definitely a scary topic, and I speak from years of experience 
with Zope 2 unicode here. This sounds like a very hard transition that 
would touch *a lot* of code in non-Zope 2 core. How do you envision all 
the form inputs to suddenly produce unicode strings, for instance?


We've struggled hard with Formulator to make it work with unicode for 
instance (and still it's buggy, as I wanted to support the non-unicode 
scenarios too). I can imagine any system in Zope that uses forms at all 
would need to be touched.


I'll volunteer to help brainstorm on this, but right now my brainstorm 
is only very dark and full of lightning.


You and I brainstormed this a few months ago.  I think this was on the
list.  I think that, for starters, we would arrange that all Zope 3
views used in Zope 2 would get unicode input.  If you like, I can try
to find this discussion. :)

Anyway, in some basics, Zope 2 does have an approach to unicode for 
*output* that's fairly similar to Zope 3's: if you use unicode strings 
your entire output (including page templates) will be unicode (if you 
don't mix with non-unicode non-ascii strings..). Then the response 
encoding setting is read and everything is transformed once to unicode 
text. Silva uses this. It also struggles to make sure all its input is 
transformed to unicode (among other ways using Formulator).


In Plone, the situation is quite different -- its 
PlacelessTranslationService monkeypatches into the page template engine 
and puts in ways so that you can mix UTF-8 and unicode strings together. 
This then goes on to break assumptions of code that uses the page 
template engine in a unicode-pure environment (which is what happened to 
Silva).


Ick.

I'm not suggesting this is easy.  We may have some messy deprecation
and backward compatibility code.  But we *do* need to solve this problem
eventually, and the solution doesn't get any closer without taking steps.

Jim

--
Jim Fulton   mailto:[EMAIL PROTECTED]   Python Powered!
CTO  (540) 361-1714http://www.python.org
Zope Corporation http://www.zope.com   http://www.zope.org
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] RFV: Unicode in Zope 2

2005-12-13 Thread Martijn Faassen

Jim Fulton wrote:

I forgot a very important need:

- Common approach to Unicode

In particular, In Zope 3, all text is stored and managed as Unicode.
The publisher decodes request data and encodes response data.  The vast
majority of application and library code can ignore encoding issues.
(The exceptions are applications and frameworks that need to exhange
text with non-Unicode-aware external systems.)  This has provided
great simplifications and allowed us to avoid common pitfals from
mixing Unicode and encoded text.

We need to migrate Zope 2 to use a similar strategy.  We need volunteers
to brainstorm how this can be done and make one or more proposals.
This is likely a prerequisite for finishing the publisher and ZPT
work.


This is definitely a scary topic, and I speak from years of experience 
with Zope 2 unicode here. This sounds like a very hard transition that 
would touch *a lot* of code in non-Zope 2 core. How do you envision all 
the form inputs to suddenly produce unicode strings, for instance?


We've struggled hard with Formulator to make it work with unicode for 
instance (and still it's buggy, as I wanted to support the non-unicode 
scenarios too). I can imagine any system in Zope that uses forms at all 
would need to be touched.


I'll volunteer to help brainstorm on this, but right now my brainstorm 
is only very dark and full of lightning.


Anyway, in some basics, Zope 2 does have an approach to unicode for 
*output* that's fairly similar to Zope 3's: if you use unicode strings 
your entire output (including page templates) will be unicode (if you 
don't mix with non-unicode non-ascii strings..). Then the response 
encoding setting is read and everything is transformed once to unicode 
text. Silva uses this. It also struggles to make sure all its input is 
transformed to unicode (among other ways using Formulator).


In Plone, the situation is quite different -- its 
PlacelessTranslationService monkeypatches into the page template engine 
and puts in ways so that you can mix UTF-8 and unicode strings together. 
This then goes on to break assumptions of code that uses the page 
template engine in a unicode-pure environment (which is what happened to 
Silva).


Regards,

Martijn
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] RFV: Unicode in Zope 2

2005-12-09 Thread Jim Fulton


A few weeks ago, I mentioned 3 big things I'd like to see for
merging Zope 2 and Zope 3:

- Common Publisher

- Common Security frameworks

- Common ZPT implementations

I forgot a very important need:

- Common approach to Unicode

In particular, In Zope 3, all text is stored and managed as Unicode.
The publisher decodes request data and encodes response data.  The vast
majority of application and library code can ignore encoding issues.
(The exceptions are applications and frameworks that need to exhange
text with non-Unicode-aware external systems.)  This has provided
great simplifications and allowed us to avoid common pitfals from
mixing Unicode and encoded text.

We need to migrate Zope 2 to use a similar strategy.  We need volunteers
to brainstorm how this can be done and make one or more proposals.
This is likely a prerequisite for finishing the publisher and ZPT
work.

Jim

--
Jim Fulton   mailto:[EMAIL PROTECTED]   Python Powered!
CTO  (540) 361-1714http://www.python.org
Zope Corporation http://www.zope.com   http://www.zope.org
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )