[Zope-dev] Re: What catalog/index to use ...

2002-11-10 Thread Maik Jablonski
Guido van Rossum wrote:

I don't know where you would expect a patch to be found, but in this
particular case the Zope Collector is a good place to look:

http://collector.zope.org/Zope/597

Use the collector, Luke! ;-)



Um, that's not a patch.  Can you attach a context or unified diff to
the collector item?


sorry for that. but the only difference between old and new code is the 
(?L)-flag in the reg-exps.

cheers, maik




___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://lists.zope.org/mailman/listinfo/zope-announce
http://lists.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: What catalog/index to use ...

2002-11-09 Thread Joachim Werner
Hi!

 Please note that former Zope versions already include a  dedicated
 unicode-aware
 splitter that is already usable with the old TextIndex and maybe with
 ZCTextIndex.
 TextIndexNG resolves all these issues by doing the complete internal
 processing by
 converting the data into unicode. Every single processing step only
handles
 unicode
 data.

 Most older browsers should be able to handle at least UTF-8 as character
 set. This is
 sufficient for most cases.

The problem seems to be that ZCTextIndex indeed does not do the splitting
right if German Umlauts are used. There is no option for Unicode-aware
splitter. Instead of a Vocabulary it uses a Lexicon, which just offers two
options: HTML aware splitter and Whitespace splitter. I haven't tested
the whitespace splitter yet, but the HTML aware splitter did not do the
Umlaut thing right without the patch, i.e. it used umlauts as splitting
characters ...

So there is a bug  ...

Joachim


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Re: What catalog/index to use ...

2002-11-09 Thread Guido van Rossum
 The problem seems to be that ZCTextIndex indeed does not do the
 splitting right if German Umlauts are used. There is no option for
 Unicode-aware splitter.  Instead of a Vocabulary it uses a
 Lexicon, which just offers two options: HTML aware splitter and
 Whitespace splitter. I haven't tested the whitespace splitter yet,
 but the HTML aware splitter did not do the Umlaut thing right
 without the patch, i.e. it used umlauts as splitting characters ...

That's just what the default ZMI interface for ZCTextIndex offers.
It's easy to add your own splitter by writing a few lines of Python
code.  RTSL.

--Guido van Rossum (home page: http://www.python.org/~guido/)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



[Zope-dev] Re: What catalog/index to use ...

2002-11-09 Thread Maik Jablonski
Guido van Rossum wrote:

The problem seems to be that ZCTextIndex indeed does not do the
splitting right if German Umlauts are used. There is no option for
Unicode-aware splitter.  Instead of a Vocabulary it uses a
Lexicon, which just offers two options: HTML aware splitter and
Whitespace splitter. I haven't tested the whitespace splitter yet,
but the HTML aware splitter did not do the Umlaut thing right
without the patch, i.e. it used umlauts as splitting characters ...



That's just what the default ZMI interface for ZCTextIndex offers.
It's easy to add your own splitter by writing a few lines of Python
code.  RTSL.


of course everyone can write his own Splitter... one for german, one for 
french,
etc.pp. but what is the problem with the patch? is pythons-regexp (?L) 
not just intended for this simple way of localizing software?

and think of the european market:

no one will buy Zope, if it is not working with your native language 
out of the box. and that's what the patch for...

cheers, maik
--
Maik Jablonski __o
www.zfl.uni-bielefeld.de _ \_Deutsche Zope User Group
Bielefeld, Germany  (_)/(_)   www.dzug.org




___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://lists.zope.org/mailman/listinfo/zope-announce
http://lists.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: What catalog/index to use ...

2002-11-09 Thread Guido van Rossum
 The problem seems to be that ZCTextIndex indeed does not do the
 splitting right if German Umlauts are used. There is no option for
 Unicode-aware splitter.  Instead of a Vocabulary it uses a
 Lexicon, which just offers two options: HTML aware splitter and
 Whitespace splitter. I haven't tested the whitespace splitter yet,
 but the HTML aware splitter did not do the Umlaut thing right
 without the patch, i.e. it used umlauts as splitting characters ...
  
  That's just what the default ZMI interface for ZCTextIndex offers.
  It's easy to add your own splitter by writing a few lines of Python
  code.  RTSL.
 
 of course everyone can write his own Splitter... one for german, one
 for french, etc.pp. but what is the problem with the patch? is
 pythons-regexp (?L) not just intended for this simple way of
 localizing software?
 
 and think of the european market:
 
 no one will buy Zope, if it is not working with your native language 
 out of the box. and that's what the patch for...

I must've missed the start of this thread (I only just signed up for
this list).  I didn't see any patch -- I thought it was just a gripe
about ZCTextIndex.  Of course patches are welcome -- where can I find
this particular patch?

--Guido van Rossum (home page: http://www.python.org/~guido/)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Re: What catalog/index to use ...

2002-11-09 Thread Joachim Werner
 I must've missed the start of this thread (I only just signed up for
 this list).  I didn't see any patch -- I thought it was just a gripe
 about ZCTextIndex.  Of course patches are welcome -- where can I find
 this particular patch?

Hi Guido!

I don't know where you would expect a patch to be found, but in this
particular case the Zope Collector is a good place to look:

http://collector.zope.org/Zope/597

Use the collector, Luke! ;-)


Joachim


P.S.: I guess most of the people on the zope-dev list have some clue on how
to write their own splitters, but the message of my gripe was that
something worked o.k. (for the dumb end user) with the old TextIndex and
doesn't with the thing that is advertised on the Add form as the
replacement, and that just isn't cool.



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Re: What catalog/index to use ...

2002-11-09 Thread Guido van Rossum
 I don't know where you would expect a patch to be found, but in this
 particular case the Zope Collector is a good place to look:
 
 http://collector.zope.org/Zope/597
 
 Use the collector, Luke! ;-)

Um, that's not a patch.  Can you attach a context or unified diff to
the collector item?

 Joachim
 
 P.S.: I guess most of the people on the zope-dev list have some clue
 on how to write their own splitters, but the message of my gripe
 was that something worked o.k. (for the dumb end user) with the old
 TextIndex and doesn't with the thing that is advertised on the Add
 form as the replacement, and that just isn't cool.

Indeed.

--Guido van Rossum (home page: http://www.python.org/~guido/)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



[Zope-dev] Re: What catalog/index to use ...

2002-11-08 Thread Maik Jablonski
Joachim Werner wrote:

Does it make sense to get ZCTextIndex fixed (there seems to be a patch in
the collector already) or should I go with TextIndexNG? If yes, is it ready
for production environments?


hi,

I've submitted a patch for locale-support for ZCTextIndex:

http://collector.zope.org/Zope/597

I run this patch on a production site [most content is german] without 
any problems. I think, if all test-cases for ZCTextIndex succeed with 
this patch, it should be merged into the next official release so all 
european zopers can use ZCTextIndex without patching it... in my 
opinion: high priority for this one done :-)

cheers, maik




___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://lists.zope.org/mailman/listinfo/zope-announce
http://lists.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: What catalog/index to use ...

2002-11-08 Thread Casey Duncan
The main reason I have not merged this already is that I lack a sample to make 
a new test with. If someone can provide me with some content samples that 
break now, but work with the patch, I will make a new test and checkin the 
fix for 2.7 perhaps 2.6.1 if desired.

-Casey

On Friday 08 November 2002 11:27 am, Maik Jablonski wrote:
 Joachim Werner wrote:
  Does it make sense to get ZCTextIndex fixed (there seems to be a patch in
  the collector already) or should I go with TextIndexNG? If yes, is it 
ready
  for production environments?
 
 hi,
 
 I've submitted a patch for locale-support for ZCTextIndex:
 
 http://collector.zope.org/Zope/597
 
 I run this patch on a production site [most content is german] without 
 any problems. I think, if all test-cases for ZCTextIndex succeed with 
 this patch, it should be merged into the next official release so all 
 european zopers can use ZCTextIndex without patching it... in my 
 opinion: high priority for this one done :-)
 
 cheers, maik
 
 
 
 
 ___
 Zope-Dev maillist  -  [EMAIL PROTECTED]
 http://lists.zope.org/mailman/listinfo/zope-dev
 **  No cross posts or HTML encoding!  **
 (Related lists - 
  http://lists.zope.org/mailman/listinfo/zope-announce
  http://lists.zope.org/mailman/listinfo/zope )
 


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



[Zope-dev] Re: What catalog/index to use ...

2002-11-08 Thread Maik Jablonski
Casey Duncan wrote:

The main reason I have not merged this already is that I lack a sample to make 
a new test with. If someone can provide me with some content samples that 
break now, but work with the patch, I will make a new test and checkin the 
fix for 2.7 perhaps 2.6.1 if desired.

-Casey

hi Casey,

try some words with german umlaute. things like:

mülltonne
waschbär
behörde
überflieger

the last one will work without the patch. explanation: the first 
character is splitted away [non-ascii-character] [both for storing the 
word in the Lexicon and resolving the query-words through the 
queryparser]. so it will in both cases end up in

berflieger

searching for 'überflieger' will give you correct results. this is the 
reason, why some people think, that ZCTextIndex works with german 
'umlaute', but it does not...;-)

hope this is helpful.

cheers, maik




___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://lists.zope.org/mailman/listinfo/zope-announce
http://lists.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: What catalog/index to use ...

2002-11-08 Thread Joachim Werner
Hi!

Some additional remarks: While making the splitting dependent on the locale
settings (as done in the old TextIndex) helps with most use cases, I'm not
sure if that is the right thing to do in the long run. Locale settings are
good for client software, i.e. if you want to have a program behave German
for a German user etc.

But a web server might be located in the U.S., but frequented by
German-speaking and Spanish-speaking users. Or even users from China or
Japan. In these cases only Unicode will help I think. After all you can not
have more than one locale at a time. But honestly I still don't understand
the Unicode thing good enough. My main concern is whether a Unicode-enabled
site will still work with older browsers and for all platforms ...

 Casey Duncan wrote:
  The main reason I have not merged this already is that I lack a sample
to make
  a new test with. If someone can provide me with some content samples
that
  break now, but work with the patch, I will make a new test and checkin
the
  fix for 2.7 perhaps 2.6.1 if desired.
 
  -Casey

 hi Casey,

 try some words with german umlaute. things like:

 mülltonne
 waschbär
 behörde
 überflieger

 the last one will work without the patch. explanation: the first
 character is splitted away [non-ascii-character] [both for storing the
 word in the Lexicon and resolving the query-words through the
 queryparser]. so it will in both cases end up in

 berflieger

 searching for 'überflieger' will give you correct results. this is the
 reason, why some people think, that ZCTextIndex works with german
 'umlaute', but it does not...;-)

This patch will probably not hurt anybody. And it would make ZCTextIndex
behave like TextIndex.

OT:

I don't want to be too pedantic about that, but usually I'd expect a
replacement to really replace all of the functionality of the thing it
replaces.
TextIndex was locale-aware (and this was even documented somewhere), so
switching to ZCTextIndex should not break anything, at least not in a Zope
final.

But that's what I've told you all the time: Why do you make things final
releases before they are really tested? 2.6.0 has a really bad bug with the
DateTime module (Lennart Regebro has provided a fix for it:
http://www.zope.org/Members/regebro/datetime_260_fix) that was introduced
after 2.6.0b1. This just shouldn't be possible ...

Anybody listening? ;-)

Cheers

Joachim


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Re: What catalog/index to use ...

2002-11-08 Thread Andreas Jung


--On Freitag, 8. November 2002 19:49 +0100 Joachim Werner 
[EMAIL PROTECTED] wrote:

Hi!

Some additional remarks: While making the splitting dependent on the
locale settings (as done in the old TextIndex) helps with most use cases,
I'm not sure if that is the right thing to do in the long run. Locale
settings are good for client software, i.e. if you want to have a program
behave German for a German user etc.

But a web server might be located in the U.S., but frequented by
German-speaking and Spanish-speaking users. Or even users from China or
Japan. In these cases only Unicode will help I think. After all you can
not have more than one locale at a time. But honestly I still don't
understand the Unicode thing good enough. My main concern is whether a
Unicode-enabled site will still work with older browsers and for all
platforms ...



Please note that former Zope versions already include a  dedicated 
unicode-aware
splitter that is already usable with the old TextIndex and maybe with 
ZCTextIndex.
TextIndexNG resolves all these issues by doing the complete internal 
processing by
converting the data into unicode. Every single processing step only handles 
unicode
data.

Most older browsers should be able to handle at least UTF-8 as character 
set. This is
sufficient for most cases.

=aj


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://lists.zope.org/mailman/listinfo/zope-announce
http://lists.zope.org/mailman/listinfo/zope )