[Zope-dev] Re: What catalog/index to use ...
Guido van Rossum wrote: I don't know where you would expect a patch to be found, but in this particular case the Zope Collector is a good place to look: http://collector.zope.org/Zope/597 Use the collector, Luke! ;-) Um, that's not a patch. Can you attach a context or unified diff to the collector item? sorry for that. but the only difference between old and new code is the (?L)-flag in the reg-exps. cheers, maik ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: What catalog/index to use ...
Hi! Please note that former Zope versions already include a dedicated unicode-aware splitter that is already usable with the old TextIndex and maybe with ZCTextIndex. TextIndexNG resolves all these issues by doing the complete internal processing by converting the data into unicode. Every single processing step only handles unicode data. Most older browsers should be able to handle at least UTF-8 as character set. This is sufficient for most cases. The problem seems to be that ZCTextIndex indeed does not do the splitting right if German Umlauts are used. There is no option for Unicode-aware splitter. Instead of a Vocabulary it uses a Lexicon, which just offers two options: HTML aware splitter and Whitespace splitter. I haven't tested the whitespace splitter yet, but the HTML aware splitter did not do the Umlaut thing right without the patch, i.e. it used umlauts as splitting characters ... So there is a bug ... Joachim ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: What catalog/index to use ...
The problem seems to be that ZCTextIndex indeed does not do the splitting right if German Umlauts are used. There is no option for Unicode-aware splitter. Instead of a Vocabulary it uses a Lexicon, which just offers two options: HTML aware splitter and Whitespace splitter. I haven't tested the whitespace splitter yet, but the HTML aware splitter did not do the Umlaut thing right without the patch, i.e. it used umlauts as splitting characters ... That's just what the default ZMI interface for ZCTextIndex offers. It's easy to add your own splitter by writing a few lines of Python code. RTSL. --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[Zope-dev] Re: What catalog/index to use ...
Guido van Rossum wrote: The problem seems to be that ZCTextIndex indeed does not do the splitting right if German Umlauts are used. There is no option for Unicode-aware splitter. Instead of a Vocabulary it uses a Lexicon, which just offers two options: HTML aware splitter and Whitespace splitter. I haven't tested the whitespace splitter yet, but the HTML aware splitter did not do the Umlaut thing right without the patch, i.e. it used umlauts as splitting characters ... That's just what the default ZMI interface for ZCTextIndex offers. It's easy to add your own splitter by writing a few lines of Python code. RTSL. of course everyone can write his own Splitter... one for german, one for french, etc.pp. but what is the problem with the patch? is pythons-regexp (?L) not just intended for this simple way of localizing software? and think of the european market: no one will buy Zope, if it is not working with your native language out of the box. and that's what the patch for... cheers, maik -- Maik Jablonski __o www.zfl.uni-bielefeld.de _ \_Deutsche Zope User Group Bielefeld, Germany (_)/(_) www.dzug.org ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: What catalog/index to use ...
The problem seems to be that ZCTextIndex indeed does not do the splitting right if German Umlauts are used. There is no option for Unicode-aware splitter. Instead of a Vocabulary it uses a Lexicon, which just offers two options: HTML aware splitter and Whitespace splitter. I haven't tested the whitespace splitter yet, but the HTML aware splitter did not do the Umlaut thing right without the patch, i.e. it used umlauts as splitting characters ... That's just what the default ZMI interface for ZCTextIndex offers. It's easy to add your own splitter by writing a few lines of Python code. RTSL. of course everyone can write his own Splitter... one for german, one for french, etc.pp. but what is the problem with the patch? is pythons-regexp (?L) not just intended for this simple way of localizing software? and think of the european market: no one will buy Zope, if it is not working with your native language out of the box. and that's what the patch for... I must've missed the start of this thread (I only just signed up for this list). I didn't see any patch -- I thought it was just a gripe about ZCTextIndex. Of course patches are welcome -- where can I find this particular patch? --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: What catalog/index to use ...
I must've missed the start of this thread (I only just signed up for this list). I didn't see any patch -- I thought it was just a gripe about ZCTextIndex. Of course patches are welcome -- where can I find this particular patch? Hi Guido! I don't know where you would expect a patch to be found, but in this particular case the Zope Collector is a good place to look: http://collector.zope.org/Zope/597 Use the collector, Luke! ;-) Joachim P.S.: I guess most of the people on the zope-dev list have some clue on how to write their own splitters, but the message of my gripe was that something worked o.k. (for the dumb end user) with the old TextIndex and doesn't with the thing that is advertised on the Add form as the replacement, and that just isn't cool. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: What catalog/index to use ...
I don't know where you would expect a patch to be found, but in this particular case the Zope Collector is a good place to look: http://collector.zope.org/Zope/597 Use the collector, Luke! ;-) Um, that's not a patch. Can you attach a context or unified diff to the collector item? Joachim P.S.: I guess most of the people on the zope-dev list have some clue on how to write their own splitters, but the message of my gripe was that something worked o.k. (for the dumb end user) with the old TextIndex and doesn't with the thing that is advertised on the Add form as the replacement, and that just isn't cool. Indeed. --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[Zope-dev] Re: What catalog/index to use ...
Joachim Werner wrote: Does it make sense to get ZCTextIndex fixed (there seems to be a patch in the collector already) or should I go with TextIndexNG? If yes, is it ready for production environments? hi, I've submitted a patch for locale-support for ZCTextIndex: http://collector.zope.org/Zope/597 I run this patch on a production site [most content is german] without any problems. I think, if all test-cases for ZCTextIndex succeed with this patch, it should be merged into the next official release so all european zopers can use ZCTextIndex without patching it... in my opinion: high priority for this one done :-) cheers, maik ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: What catalog/index to use ...
The main reason I have not merged this already is that I lack a sample to make a new test with. If someone can provide me with some content samples that break now, but work with the patch, I will make a new test and checkin the fix for 2.7 perhaps 2.6.1 if desired. -Casey On Friday 08 November 2002 11:27 am, Maik Jablonski wrote: Joachim Werner wrote: Does it make sense to get ZCTextIndex fixed (there seems to be a patch in the collector already) or should I go with TextIndexNG? If yes, is it ready for production environments? hi, I've submitted a patch for locale-support for ZCTextIndex: http://collector.zope.org/Zope/597 I run this patch on a production site [most content is german] without any problems. I think, if all test-cases for ZCTextIndex succeed with this patch, it should be merged into the next official release so all european zopers can use ZCTextIndex without patching it... in my opinion: high priority for this one done :-) cheers, maik ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope ) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[Zope-dev] Re: What catalog/index to use ...
Casey Duncan wrote: The main reason I have not merged this already is that I lack a sample to make a new test with. If someone can provide me with some content samples that break now, but work with the patch, I will make a new test and checkin the fix for 2.7 perhaps 2.6.1 if desired. -Casey hi Casey, try some words with german umlaute. things like: mülltonne waschbär behörde überflieger the last one will work without the patch. explanation: the first character is splitted away [non-ascii-character] [both for storing the word in the Lexicon and resolving the query-words through the queryparser]. so it will in both cases end up in berflieger searching for 'überflieger' will give you correct results. this is the reason, why some people think, that ZCTextIndex works with german 'umlaute', but it does not...;-) hope this is helpful. cheers, maik ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: What catalog/index to use ...
Hi! Some additional remarks: While making the splitting dependent on the locale settings (as done in the old TextIndex) helps with most use cases, I'm not sure if that is the right thing to do in the long run. Locale settings are good for client software, i.e. if you want to have a program behave German for a German user etc. But a web server might be located in the U.S., but frequented by German-speaking and Spanish-speaking users. Or even users from China or Japan. In these cases only Unicode will help I think. After all you can not have more than one locale at a time. But honestly I still don't understand the Unicode thing good enough. My main concern is whether a Unicode-enabled site will still work with older browsers and for all platforms ... Casey Duncan wrote: The main reason I have not merged this already is that I lack a sample to make a new test with. If someone can provide me with some content samples that break now, but work with the patch, I will make a new test and checkin the fix for 2.7 perhaps 2.6.1 if desired. -Casey hi Casey, try some words with german umlaute. things like: mülltonne waschbär behörde überflieger the last one will work without the patch. explanation: the first character is splitted away [non-ascii-character] [both for storing the word in the Lexicon and resolving the query-words through the queryparser]. so it will in both cases end up in berflieger searching for 'überflieger' will give you correct results. this is the reason, why some people think, that ZCTextIndex works with german 'umlaute', but it does not...;-) This patch will probably not hurt anybody. And it would make ZCTextIndex behave like TextIndex. OT: I don't want to be too pedantic about that, but usually I'd expect a replacement to really replace all of the functionality of the thing it replaces. TextIndex was locale-aware (and this was even documented somewhere), so switching to ZCTextIndex should not break anything, at least not in a Zope final. But that's what I've told you all the time: Why do you make things final releases before they are really tested? 2.6.0 has a really bad bug with the DateTime module (Lennart Regebro has provided a fix for it: http://www.zope.org/Members/regebro/datetime_260_fix) that was introduced after 2.6.0b1. This just shouldn't be possible ... Anybody listening? ;-) Cheers Joachim ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: What catalog/index to use ...
--On Freitag, 8. November 2002 19:49 +0100 Joachim Werner [EMAIL PROTECTED] wrote: Hi! Some additional remarks: While making the splitting dependent on the locale settings (as done in the old TextIndex) helps with most use cases, I'm not sure if that is the right thing to do in the long run. Locale settings are good for client software, i.e. if you want to have a program behave German for a German user etc. But a web server might be located in the U.S., but frequented by German-speaking and Spanish-speaking users. Or even users from China or Japan. In these cases only Unicode will help I think. After all you can not have more than one locale at a time. But honestly I still don't understand the Unicode thing good enough. My main concern is whether a Unicode-enabled site will still work with older browsers and for all platforms ... Please note that former Zope versions already include a dedicated unicode-aware splitter that is already usable with the old TextIndex and maybe with ZCTextIndex. TextIndexNG resolves all these issues by doing the complete internal processing by converting the data into unicode. Every single processing step only handles unicode data. Most older browsers should be able to handle at least UTF-8 as character set. This is sufficient for most cases. =aj ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )