Re: [Zope] How to convert Zope instance charset?
--On Montag, 25. April 2005 10:42 Uhr +0200 Daniel Dekany <[EMAIL PROTECTED]> wrote: As someone who works often with Java I absolutely agree with it. Just I don't know how to do it with Zope/Plone/other 3rd party products (not written by me), since they use not unicode strings. I don't know, maybe it can be specified for Python that it uses unicode for plain strings as well, but anybody has successfully done that with Zope? Like, what about the C parts of Zope then? Well back in the history of Zope there was only ascii and Python had no unicode support. Unicode support moved into Zope over time. Some parts are still having problems and these problems will never be solved completely. The sources are just too old. But there is usually a good way to get around a particular problem. Zope 3 in contrast uses Unicode everywhere...so it is clean by design. Unfortunately there is no magic treat-all-my-strings-as-unicode-strings in Python. So if you still have a specific problem ask again and we might help. In your case it should be easy to convert your utf8 data to unicode strings and sort then using the existing methods. Andreas pgpWZs3SlrRkC.pgp Description: PGP signature ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] How to convert Zope instance charset?
Monday, April 25, 2005, 5:34:04 AM, Andreas Jung wrote: > One last note from myside. I have experiences with unicode since over > 7 years while working with multilingual documents in the e-publishing > business. It is good practice to perform *any* unicode related work > *only* on unicode datatypes (Python unicode strings!!!) > and *not* some byte-encoded unicode strings as utf8 or whatever. These > encodings should only be used on the output level when presenting > unicode data to the user - either through-the-web, as export format > etc. This is a strong advice you should follow. As someone who works often with Java I absolutely agree with it. Just I don't know how to do it with Zope/Plone/other 3rd party products (not written by me), since they use not unicode strings. I don't know, maybe it can be specified for Python that it uses unicode for plain strings as well, but anybody has successfully done that with Zope? Like, what about the C parts of Zope then? > -aj -- Best regards, Daniel Dekany ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] How to convert Zope instance charset?
One last note from myside. I have experiences with unicode since over 7 years while working with multilingual documents in the e-publishing business. It is good practice to perform *any* unicode related work *only* on unicode datatypes (Python unicode strings!!!) and *not* some byte-encoded unicode strings as utf8 or whatever. These encodings should only be used on the output level when presenting unicode data to the user - either through-the-web, as export format etc. This is a strong advice you should follow. -aj --On Sonntag, 24. April 2005 21:31 Uhr +0200 Daniel Dekany <[EMAIL PROTECTED]> wrote: Sunday, April 24, 2005, 7:22:42 PM, Andreas Jung wrote: --On Sonntag, 24. April 2005 18:34 Uhr +0200 Daniel Dekany <[EMAIL PROTECTED]> wrote: Maybe *you* don't get the point. Python has a "virtual machine level" setting that specifies the locale and encoding (the charset). You can set it for example like: locale.setlocale('hu_HU', 'ISO-8859-2'). And although there is no charset information attached to strings, locale.strcoll and such will assume that the string is in the encoding specified globally like above, right? All the strings (which is not an unicode string) is assumed to use that encoding. It seems to me that it works like that until I specify 'UTF-8' in the locale, in which case it goes mad. I am very much aware of the issue (btw. it was me who integrated sequence.sort()). You are aware of it, so why do you dispute if UTF-8 is supported (read: can be used in practice)? Just how on the earth should anybody solve this "on the application level"? Should everybody use only his own products (that convert everything to "unicode" string)? Also everybody should patch ZCatalog and similar core components which also sorts non-unicode strings? Simply said, people can't use Zope with UTF-8 in practice (while I guess they can with ISO-8859-x, right?), with my original words "Since Python/Zope/etc practically doesn't support utf-8". And I didn't meant to hurt or upbraid Zope fans with it at all, I just said it as a fact in the middle of a sentence, and then what happens... (Why Plone guys use UTF-8 as default, I don't know, maybe they didn't realized it doesn't work for people who really utilize utf-8. I'm for too long in the business to be surprised. :) I think I will ask them...) And if you look carefully at the API of sequence.sort() then you will see that there is already built-in support locale-aware comparisons. Yeah, I tried to use that earlier, but as I said for many times here, it can't sort utf-8 encoded strings, despite that I have set the "global locale" to something.utf8, certainly because Python locale.strcoll can't. So at the end: You Can Not Use UTF-8 with Zope. Right? S, then back to the original question: converting and utf-8 instance to ISO-8859-2 instance, any idea? (Or is it still the standpoint that I should not??? D-%) -aj -- Best regards, Daniel Dekany pgpzts351xRHE.pgp Description: PGP signature ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] How to convert Zope instance charset?
--On Sonntag, 24. April 2005 21:31 Uhr +0200 Daniel Dekany <[EMAIL PROTECTED]> wrote: Yeah, I tried to use that earlier, but as I said for many times here, it can't sort utf-8 encoded strings, despite that I have set the "global locale" to something.utf8, certainly because Python locale.strcoll can't. If this method does not work as expected then this is likely a bug or a problem of the underlying implementation in the C lib. locale.strcoll is just a *thin* layer on top of the libc of your operating system. Means: Python just passed the data to the libc strcoll() method and returns the result. And again my hint: if you want to deal in a reasonable way with different charsets: use unicode strings. -aj pgpuqLggOTetL.pgp Description: PGP signature ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] How to convert Zope instance charset?
Sunday, April 24, 2005, 7:22:42 PM, Andreas Jung wrote: > > > --On Sonntag, 24. April 2005 18:34 Uhr +0200 Daniel Dekany > <[EMAIL PROTECTED]> wrote: > >> >> Maybe *you* don't get the point. Python has a "virtual machine level" >> setting that specifies the locale and encoding (the charset). You can >> set it for example like: locale.setlocale('hu_HU', 'ISO-8859-2'). And >> although there is no charset information attached to strings, >> locale.strcoll and such will assume that the string is in the encoding >> specified globally like above, right? All the strings (which is not an >> unicode string) is assumed to use that encoding. It seems to me that it >> works like that until I specify 'UTF-8' in the locale, in which case it >> goes mad. > > > I am very much aware of the issue (btw. it was me who integrated > sequence.sort()). You are aware of it, so why do you dispute if UTF-8 is supported (read: can be used in practice)? Just how on the earth should anybody solve this "on the application level"? Should everybody use only his own products (that convert everything to "unicode" string)? Also everybody should patch ZCatalog and similar core components which also sorts non-unicode strings? Simply said, people can't use Zope with UTF-8 in practice (while I guess they can with ISO-8859-x, right?), with my original words "Since Python/Zope/etc practically doesn't support utf-8". And I didn't meant to hurt or upbraid Zope fans with it at all, I just said it as a fact in the middle of a sentence, and then what happens... (Why Plone guys use UTF-8 as default, I don't know, maybe they didn't realized it doesn't work for people who really utilize utf-8. I'm for too long in the business to be surprised. :) I think I will ask them...) > And if you look carefully at the API of sequence.sort() > then you will see that there is already built-in support locale-aware > comparisons. Yeah, I tried to use that earlier, but as I said for many times here, it can't sort utf-8 encoded strings, despite that I have set the "global locale" to something.utf8, certainly because Python locale.strcoll can't. So at the end: You Can Not Use UTF-8 with Zope. Right? S, then back to the original question: converting and utf-8 instance to ISO-8859-2 instance, any idea? (Or is it still the standpoint that I should not??? D-%) > -aj -- Best regards, Daniel Dekany ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] How to convert Zope instance charset?
--On Sonntag, 24. April 2005 18:34 Uhr +0200 Daniel Dekany <[EMAIL PROTECTED]> wrote: Maybe *you* don't get the point. Python has a "virtual machine level" setting that specifies the locale and encoding (the charset). You can set it for example like: locale.setlocale('hu_HU', 'ISO-8859-2'). And although there is no charset information attached to strings, locale.strcoll and such will assume that the string is in the encoding specified globally like above, right? All the strings (which is not an unicode string) is assumed to use that encoding. It seems to me that it works like that until I specify 'UTF-8' in the locale, in which case it goes mad. I am very much aware of the issue (btw. it was me who integrated sequence.sort()). And if you look carefully at the API of sequence.sort() then you will see that there is already built-in support locale-aware comparisons. -aj pgpdF59nBPHrt.pgp Description: PGP signature ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] How to convert Zope instance charset?
Sunday, April 24, 2005, 6:05:42 PM, Andreas Jung wrote: > > > --On Sonntag, 24. April 2005 17:45 Uhr +0200 Daniel Dekany > <[EMAIL PROTECTED]> wrote: > >> Sunday, April 24, 2005, 4:22:10 PM, Andreas Jung wrote: >> First of all, in this thread I don't care whose mistake it is. My >> concern is if I can use Zope with UTF-8 (in fact, Plone) in reality or >> not. Assume that I'm using a few non-US-ASCII characters, and I want >> sometimes show things alphabetically sorted... >> > > You're not getting the point. As long as you handle with Python string > and not with unicode strings then there is no way in Zope deal correctly > with different kind of encodings...As I said...it is an application side > problem. Zope and Python provide you the tools to deal with UTF8 but > you need to solve such problems on in your application. That's my last > comment on this issue :-) Maybe *you* don't get the point. Python has a "virtual machine level" setting that specifies the locale and encoding (the charset). You can set it for example like: locale.setlocale('hu_HU', 'ISO-8859-2'). And although there is no charset information attached to strings, locale.strcoll and such will assume that the string is in the encoding specified globally like above, right? All the strings (which is not an unicode string) is assumed to use that encoding. It seems to me that it works like that until I specify 'UTF-8' in the locale, in which case it goes mad. And, to Max M., regarding patching sequence.sort: a) There is no guarantee that everything uses for sequence.sort. Some code may calls locale.strcoll directly and such, which can result in all sort of inconsistency. The fix could be done at the root of the problem, which is I belive strcoll. b) If the problem is in Zope (that I doubt) then it should be patched in Zope itself, not by everybody individually. That is, for Andreas Jung, if the locale.getlocale(locale.LC_COLLATE) indicates that the default charset is UTF-8, then it should be sorted like that. But again, I think it should be actually fixed in Python level (in locale.strcoll), and not on the Zope level. Anyway, I have already accepted earlier that while Zope will certainly work with ISO-8859-2 (that locale.strcoll handles correctly), it will not work with UTF-8. Hence, Zope doesn't work well with UTF-8, while it works with "older" charsets (it does without any extra effort, right?). So I just asked how to switch over ISO-8859-2, and then some start to tell that it works with UTF-8, and that it should be solved on application level(!!!)... > -aj -- Best regards, Daniel Dekany ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] How to convert Zope instance charset?
--On Sonntag, 24. April 2005 17:45 Uhr +0200 Daniel Dekany <[EMAIL PROTECTED]> wrote: Sunday, April 24, 2005, 4:22:10 PM, Andreas Jung wrote: First of all, in this thread I don't care whose mistake it is. My concern is if I can use Zope with UTF-8 (in fact, Plone) in reality or not. Assume that I'm using a few non-US-ASCII characters, and I want sometimes show things alphabetically sorted... You're not getting the point. As long as you handle with Python string and not with unicode strings then there is no way in Zope deal correctly with different kind of encodings...As I said...it is an application side problem. Zope and Python provide you the tools to deal with UTF8 but you need to solve such problems on in your application. That's my last comment on this issue :-) -aj pgpKQwvMkvMQx.pgp Description: PGP signature ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] How to convert Zope instance charset?
Sunday, April 24, 2005, 4:22:10 PM, Andreas Jung wrote: > > > --On Sonntag, 24. April 2005 16:03 Uhr +0200 Daniel Dekany > <[EMAIL PROTECTED]> wrote: > >> Sunday, April 24, 2005, 2:36:24 PM, Andreas Jung wrote: >> >>> --On Sonntag, 24. April 2005 14:18 Uhr +0200 Daniel Dekany >>> <[EMAIL PROTECTED]> wrote: >>> I have a Zope instance that uses utf-8 for everything. Since Python/Zope/etc practically doesn't support utf-8, >>> >>> Please explain in which sense Zope would not support utf-8. For your >>> information: >> >> It can't sort strings alphabetically *anywhere* (concretely: the >> accented letters will go to the end of the list -- I guess because 0x80 >> is mathematically greater than the code of the US-ASCII characters). > > This is neither a problem of Zope nor of Python! A Python string has no > notion an an encoding. The sort method can not smell the encoding. First of all, in this thread I don't care whose mistake it is. My concern is if I can use Zope with UTF-8 (in fact, Plone) in reality or not. Assume that I'm using a few non-US-ASCII characters, and I want sometimes show things alphabetically sorted... Then, of course if something wants to collate string for human reading, it will use locale.strcoll, which do consider charset and locale. That locale.strcoll is wrong with UTF-8, that's certainly the mistake of Python, right? > Instead use Python unicode strings and depend on the sorting order > defined by the Unicode standard. I take that advice, but unfortunately it's not about my Python code, but about other people's Python code. > This is an application-level problem but not a server-side problem. Zope itself gives a method for sorting strings: DocumentTemplate.sequence.sort. Many of the products relies on that for sorting. And that sorts UTF-8 incorrectly (I guess because locale.strcoll does it incorrectly). Also, ZCatalog sorts incorrectly (surely for the same reason), which is also the part of the standard Zope distribution. >>> Plone has UTF8 as default charset. >> >> Believe me, I really hope I'm wrong. So how could I achieve that strings >> are sorted correctly? If it works for someone, how? (I have locale >> hu_HU.UTF-8 in zope.conf, I have even printed >> locale.getlocale(locale.LC_COLLATE) from products and external methods, >> and it was hu_HU.UTF-8. Note that at least on Python level sorting with >> hu_HU.ISO-8859-2 works... so I hope it would work with Plone as well.) >> > > see above..Also the standard sort() methods of Python does not care about > your > locales (why should it)strings are streams of bytes...nothing else... I know, and I have referred to locale.strcoll, which does care about encoding and locale. Seems many products use that (indirectly) when they want to sort something. > sort() accepts a user-defined comparison method of implement user-specific > sorting. Yes, but this doesn't help, unless I write an UTF-8 comparison method, and then find all sort() and locale.sort() calls in Zope, Plone, and in other products, and patch them all... > And there are also methods in Python "locale" module to perform > locale-dependent comparison. Which I can't get working with UTF-8, it puts non-US-ASCII letters at the end of the list. Somebody did? How? I'm all ears. I guess the Plone site should suddenly sort correctly then, at least on the places where the programmer of the Zope product was wise enough not to use raw sort(). > Once again: you must solve your problem on the application layer... (Anyway string collation is not an application level problem in principle. It is the same for a book store application and for a first person shooter, there is nothing application specific in it. If Python is not mature enough to take this task, that's a different question.) > Zope does not help you at this point because it can't. So however I formulate it, the end is that you *practically* can't use UTF-8 with Zope, unless you are using a language that doesn't use non-US-ASCII characters, in which case you don't utilize UTF-8. Hence, I said it is "not supported"... It doesn't mean that it is the mistake of Zope, it just means that you can't use it. So, back to the topic... Since UTF-8 is not working (it seems), how could I convert that already filled instance to use ISO-8859-2 instead of UTF-8? Some tool helps me in it done relatively easy? > -aj -- Best regards, Daniel Dekany ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] How to convert Zope instance charset?
--On Sonntag, 24. April 2005 16:03 Uhr +0200 Daniel Dekany <[EMAIL PROTECTED]> wrote: Sunday, April 24, 2005, 2:36:24 PM, Andreas Jung wrote: --On Sonntag, 24. April 2005 14:18 Uhr +0200 Daniel Dekany <[EMAIL PROTECTED]> wrote: I have a Zope instance that uses utf-8 for everything. Since Python/Zope/etc practically doesn't support utf-8, Please explain in which sense Zope would not support utf-8. For your information: It can't sort strings alphabetically *anywhere* (concretely: the accented letters will go to the end of the list -- I guess because 0x80 is mathematically greater than the code of the US-ASCII characters). This is neither a problem of Zope nor of Python! A Python string has no notion an an encoding. The sort method can not smell the encoding. Instead use Python unicode strings and depend on the sorting order defined by the Unicode standard. This is an application-level problem but not a server-side problem. Plone has UTF8 as default charset. Believe me, I really hope I'm wrong. So how could I achieve that strings are sorted correctly? If it works for someone, how? (I have locale hu_HU.UTF-8 in zope.conf, I have even printed locale.getlocale(locale.LC_COLLATE) from products and external methods, and it was hu_HU.UTF-8. Note that at least on Python level sorting with hu_HU.ISO-8859-2 works... so I hope it would work with Plone as well.) see above..Also the standard sort() methods of Python does not care about your locales (why should it)strings are streams of bytes...nothing else... sort() accepts a user-defined comparison method of implement user-specific sorting. And there are also methods in Python "locale" module to perform locale-dependent comparison. Once again: you must solve your problem on the application layer...Zope does not help you at this point because it can't. -aj pgp079YlETq1G.pgp Description: PGP signature ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] How to convert Zope instance charset?
Sunday, April 24, 2005, 2:36:24 PM, Andreas Jung wrote: > --On Sonntag, 24. April 2005 14:18 Uhr +0200 Daniel Dekany > <[EMAIL PROTECTED]> wrote: > >> I have a Zope instance that uses utf-8 for everything. Since >> Python/Zope/etc practically doesn't support utf-8, > > Please explain in which sense Zope would not support utf-8. For your > information: It can't sort strings alphabetically *anywhere* (concretely: the accented letters will go to the end of the list -- I guess because 0x80 is mathematically greater than the code of the US-ASCII characters). Kind of basic fundamental thing for a portal, or for text handling in general. I have asked here earlier how to solve it, but there was no answer that could be applied in practice (i.e. the answer was that I should write custom fixes for each individual products, and/or write a patch for Zope, and then maybe for Python... If so, this is equal of saying that UTF-8 is not supported yet). > Plone has UTF8 as default charset. Aha. Then this is why this Plone site I should fix/maintain used UTF-8 everywhere. I believed it was a bad decision of my predecessor. (But then this problem is even more mysterious for me: *if* it doesn't working (yet), then why did the Plone authors chose that?) > In general Zope does not care much about > encoded strings except for some conversions. Dealing with utf8 might be > tricky > in some cases but saying Zope does not support Utf-8 is wrong. Believe me, I really hope I'm wrong. So how could I achieve that strings are sorted correctly? If it works for someone, how? (I have locale hu_HU.UTF-8 in zope.conf, I have even printed locale.getlocale(locale.LC_COLLATE) from products and external methods, and it was hu_HU.UTF-8. Note that at least on Python level sorting with hu_HU.ISO-8859-2 works... so I hope it would work with Plone as well.) > -aj -- Best regards, Daniel Dekany ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] How to convert Zope instance charset?
--On Sonntag, 24. April 2005 14:18 Uhr +0200 Daniel Dekany <[EMAIL PROTECTED]> wrote: I have a Zope instance that uses utf-8 for everything. Since Python/Zope/etc practically doesn't support utf-8, Please explain in which sense Zope would not support utf-8. For your information: Plone has UTF8 as default charset. In general Zope does not care much about encoded strings except for some conversions. Dealing with utf8 might be tricky in some cases but saying Zope does not support Utf-8 is wrong. -aj pgpGiUxj5L8zA.pgp Description: PGP signature ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
[Zope] How to convert Zope instance charset?
I have a Zope instance that uses utf-8 for everything. Since Python/Zope/etc practically doesn't support utf-8, I would like to switch over to ISO-8859-2 (for everything). The problem is that this instance is a fat site that is on-line for several months, so I have to convert the strings stored inside the lot of already existing objects of the ZODB. Any idea how to do it, like an already existing tool for it? -- Best regards, Daniel Dekany ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )