Re: [Zope] Pre-indexing filter and accented letters (mostly solved)

2005-06-16 Thread Yuri


ZCTextindex to search and catalog accented words as non accented.

- Step 1

Add to Lexicon.py (around line 190) this code, which filters the things 
in the pipeline:


---

class RemoveAccented:

  def filter_word(self, w):
  """ filter the non ascii letters to ascii"""

  # trasformo la stringa w in unicode...
  parola = unicode(w,'latin-1')

  xlate={0xc0:'A', 0xc1:'A', 0xc2:'A', 0xc3:'A', 0xc4:'A', 0xc5:'A',
  0xc6:'Ae', 0xc7:'C',
  0xc8:'E', 0xc9:'E', 0xca:'E', 0xcb:'E',
  0xcc:'I', 0xcd:'I', 0xce:'I', 0xcf:'I',
  0xd0:'Th', 0xd1:'N',
  0xd2:'O', 0xd3:'O', 0xd4:'O', 0xd5:'O', 0xd6:'O', 0xd8:'O',
  0xd9:'U', 0xda:'U', 0xdb:'U', 0xdc:'U',
  0xdd:'Y', 0xde:'th', 0xdf:'ss',
  0xe0:'a', 0xe1:'a', 0xe2:'a', 0xe3:'a', 0xe4:'a', 0xe5:'a',
  0xe6:'ae', 0xe7:'c',
  0xe8:'e', 0xe9:'e', 0xea:'e', 0xeb:'e',
  0xec:'i', 0xed:'i', 0xee:'i', 0xef:'i',
  0xf0:'th', 0xf1:'n',
  0xf2:'o', 0xf3:'o', 0xf4:'o', 0xf5:'o', 0xf6:'o', 0xf8:'o',
  0xf9:'u', 0xfa:'u', 0xfb:'u', 0xfc:'u',
  0xfd:'y', 0xfe:'th', 0xff:'y',
  0xa1:'!', 0xa2:'{cent}', 0xa3:'{pound}', 0xa4:'{currency}',
  0xa5:'{yen}', 0xa6:'|', 0xa7:'{section}', 0xa8:'{umlaut}',
  0xa9:'{C}', 0xaa:'{^a}', 0xab:'<<', 0xac:'{not}',
  0xad:'-', 0xae:'{R}', 0xaf:'_', 0xb0:'{degrees}',
  0xb1:'{+/-}', 0xb2:'{^2}', 0xb3:'{^3}', 0xb4:"'",
  0xb5:'{micro}', 0xb6:'{paragraph}', 0xb7:'*', 0xb8:'{cedilla}',
  0xb9:'{^1}', 0xba:'{^o}', 0xbb:'>>',
  0xbc:'{1/4}', 0xbd:'{1/2}', 0xbe:'{3/4}', 0xbf:'?',
  0xd7:'*', 0xf7:'/'
  }

  r = ''
  for i in parola:
  if xlate.has_key(ord(i)):
  r += xlate[ord(i)]
  elif
  r += str(i)

  return r


  def process(self, lst):
  return [self.filter_word(w) for w in lst]

element_factory.registerFactory('Remove Accented',
  'Remove Accented',
  RemoveAccented)

---

Step 2

Add the locale support for a latin-1 language, I added -L it_IT to zope 
start (in 2.7 you have to enable it in etc/zope.conf)


Then you can search for "aççented" and find "accented" ;-)
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Pre-indexing filter and accented letters

2005-06-09 Thread Dieter Maurer
Please stay on the list -- readded...

Yuri wrote at 2005-6-9 13:18 +0200:
>>Please read carefully the ZCatalog chapter of the Zope Book,
>>when you do not understand why using a new name can help you
>>with this...
>>  
>>
>http://www.plope.com/Books/2_7Edition/SearchingZCatalog.stx
>
> there's no mention of Indexes as "NormalizedSearchableText". There's 
>SearchableText, but It is not related to the topic...

Of course, "NormalizedSearchableText" is *not* mentioned.
It is no predefined index. Instead, you should create it.

Please reread the chapter again. You are looking for the
(general) description how the catalog interacts with the object to
determine for which values it should index the object.

Once you have understood that, you will understand
my proposal to solve your problem...

> ...
> I mean, I know that chapter, I know Zcatalog. What I want is prefilter 
>an existing, named, index.

You cannot prefilter an existing index (I told you already!).
You must create a new one, define a script with the name of
the new index and there do your normalization.
You can trust me (in this regard) ...


If you use a "ZCTextIndex", then you can keep the "SearchableText"
name for the (new!) index. In this case, you must use the name of
your normalizing script as "Indexed attributes" in the
definition of your "ZCTextIndex".


-- 
Dieter
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Pre-indexing filter and accented letters

2005-06-08 Thread Dieter Maurer
Yuri wrote at 2005-6-8 10:11 +0200:
> ...
>>When you rebuild it, you can also give it a different name.
>>  
>>
>
> Why? I usually gave it the name of the form input I want to index...

Because you want to include a processing step.

Please read carefully the ZCatalog chapter of the Zope Book,
when you do not understand why using a new name can help you
with this...

> I though just to index the new objects... but I miss the picture maybe, 
>what is so important with the names "NormalizedSearchableText" and 
>"SearchableText"?

Read the chapter mentioned above. Come back when you then
have more questions...

-- 
Dieter
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Pre-indexing filter and accented letters

2005-06-08 Thread Yuri

Dieter Maurer ha scritto:


Yuri wrote at 2005-6-7 10:37 +0200:
 


...
   


Implement a PythonScript that performs the normalization of
"context.SearchableText()", say "NormalizedSearchableText".

Ensure, it is acquirable by your indexed objects.

Index "NormalizedSearchableText" rather than "SearchableText"
and use this index for your searches.

Ensure, that you perform the same normalization on search
terms before you use them in a query.


 

Weel, I cannot change the index, it already has his name... it is a 
collection of thousands of object, this one I want to pre-filter before 
index are just a small part...
   



But your index currently has unnormalized values.
Thus, you must rebuild it.
 



I don't need it for other objects I already have. But, as a bonus, it 
would not be so bad, so it is not really a problem :)



When you rebuild it, you can also give it a different name.
 



Why? I usually gave it the name of the form input I want to index...

I though just to index the new objects... but I miss the picture maybe, 
what is so important with the names "NormalizedSearchableText" and 
"SearchableText"?


 


Or you mean I have to do something about  SearchableText()?
   



Yes, replace it by "NormalizedSearchableText".
 



How? :-? Maybe I miss some overloading or acquisition?


Can I hook somewhere in the middle, so I Index them in the way I want? :)
   



You can (and must) normalized the search terms.
However, the indexed values need be normalized, too.
 


Ok


Almost surely, there are not now. This means, rebuilding the
index -- this time with normalization...
 



And how do I add it? Just creating the python script and using 
acquisition? How does it work? :P

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Pre-indexing filter and accented letters

2005-06-07 Thread Dieter Maurer
Yuri wrote at 2005-6-7 10:37 +0200:
> ...
>>Implement a PythonScript that performs the normalization of
>>"context.SearchableText()", say "NormalizedSearchableText".
>>
>>Ensure, it is acquirable by your indexed objects.
>>
>>Index "NormalizedSearchableText" rather than "SearchableText"
>>and use this index for your searches.
>>
>>Ensure, that you perform the same normalization on search
>>terms before you use them in a query.
>>  
>>
>
> Weel, I cannot change the index, it already has his name... it is a 
>collection of thousands of object, this one I want to pre-filter before 
>index are just a small part...

But your index currently has unnormalized values.
Thus, you must rebuild it.

When you rebuild it, you can also give it a different name.

> Or you mean I have to do something about  SearchableText()?

Yes, replace it by "NormalizedSearchableText".

> I have to index in a way the user find the term even if it does not use 
>accented letters on a current index that already has indexed thousands 
>of objects...

I have understood that...
And my advice applied to precisely this situation...

> Can I hook somewhere in the middle, so I Index them in the way I want? :)

You can (and must) normalized the search terms.
However, the indexed values need be normalized, too.

Almost surely, there are not now. This means, rebuilding the
index -- this time with normalization...


-- 
Dieter
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Pre-indexing filter and accented letters

2005-06-07 Thread Yuri

Dieter Maurer ha scritto:


Yuri wrote at 2005-6-6 11:56 +0200:
 

I would like to index a text property of an object in the ZCatalog. The 
text is in French language, but I have a problem: I have to find results 
for the related non accented letters!


I mean, If I do a search for  "actualite", the index should return also 
the object which text contains "actualitè".
   



Implement a PythonScript that performs the normalization of
"context.SearchableText()", say "NormalizedSearchableText".

Ensure, it is acquirable by your indexed objects.

Index "NormalizedSearchableText" rather than "SearchableText"
and use this index for your searches.

Ensure, that you perform the same normalization on search
terms before you use them in a query.
 



Weel, I cannot change the index, it already has his name... it is a 
collection of thousands of object, this one I want to pre-filter before 
index are just a small part...


Or you mean I have to do something about  SearchableText()?

I have to index in a way the user find the term even if it does not use 
accented letters on a current index that already has indexed thousands 
of objects...


Can I hook somewhere in the middle, so I Index them in the way I want? :)


By the way, "ManagableIndex" greatly facilitates the inclusion
of normalizers. However, it currently does not interface with
a "TextIndex" (only a "WordIndex").
 



I'll take a look, thanks :)
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Pre-indexing filter and accented letters

2005-06-06 Thread Dieter Maurer
Yuri wrote at 2005-6-6 11:56 +0200:
>I would like to index a text property of an object in the ZCatalog. The 
>text is in French language, but I have a problem: I have to find results 
>for the related non accented letters!
>
> I mean, If I do a search for  "actualite", the index should return also 
>the object which text contains "actualitè".

Implement a PythonScript that performs the normalization of
"context.SearchableText()", say "NormalizedSearchableText".

Ensure, it is acquirable by your indexed objects.

Index "NormalizedSearchableText" rather than "SearchableText"
and use this index for your searches.

Ensure, that you perform the same normalization on search
terms before you use them in a query.

By the way, "ManagableIndex" greatly facilitates the inclusion
of normalizers. However, it currently does not interface with
a "TextIndex" (only a "WordIndex").

  


-- 
Dieter
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


[Zope] Pre-indexing filter and accented letters

2005-06-06 Thread Yuri
I would like to index a text property of an object in the ZCatalog. The 
text is in French language, but I have a problem: I have to find results 
for the related non accented letters!


I mean, If I do a search for  "actualite", the index should return also 
the object which text contains "actualitè".


I cannot convert the index to textindexNG, now it is TextIndex (at 
least, I can covert it to ZCTextIndex).


An idea could be, for example, to convert the text before it get indexed...

Where should I look in the code? Can it be possible? Any other 
suggestion? :)


TIA!

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope-dev )