Hi!

How about converting the entities in source documents to their respective 
unicode values and put the result in the search index?  I think that would be 
the cleanest solution.

Greetings,
Felix

On Feb 19, 2011, at 2:08 PM, Jan Haderka wrote:

> 
> On Feb 16, 2011, at 8:39 AM, frank rittinger wrote:
> 
>> Hi Jan,
>>  
>> wouldn’t your suggestion make it impossible to find text in uploaded 
>> documents, e.g. PDF etc.?
> 
> yeah, that's what i meant by saying "here is still problem of searching on 
> the full text of binary docs."
> 
>>  
>> I went for something similar then your other solution: Build an OR-query 
>> that searches for the original search terms (found in titles, documents, 
>> etc.) or the search term HTML-escaped; instead of two separate queries. It 
>> looks like this solution works.
>>  
>> Best regards,
>>  
>> Frank
>>  
>> Von: [email protected] 
>> [mailto:[email protected]] Im Auftrag von Jan Haderka
>> Gesendet: Dienstag, 15. Februar 2011 21:04
>> An: Magnolia User-List
>> Betreff: Re: [magnolia-user] simple fulltext search and non ASCII characters 
>> in wysiwyg fields
>>  
>>  
>> On Feb 11, 2011, at 9:19 PM, frank rittinger wrote:
>> 
>> 
>> Hi list,
>>  
>> I just realized, that the simple search cannot find non-ASCII characters 
>> (e.g. “Täst”) that were edited in a wysiwyg editor, e.g. TextImage 
>> paragraph. However  searching for “Täst” finds the expected result. Is 
>> there an easy way to also find characters that were transformed to html 
>> entities by the fckedit?
>>  
>> One possible solution would be to replace all non-ASCII characters in the 
>> SearchModel by their corresponding html entity. Is there a more straight 
>> forward way?
>>  
>> I would say the simplest solution is to extend Edit control with the simple 
>> flag "escapeChars" which would then escape all html entities on save so the 
>> edit will be same as FckEdit content.
>> The second step would be then perform same escaping on the search query 
>> prior the search. 
>> Since all the content will be escaped, you should get all the hits no matter 
>> whether created with edit or fckEdit. There is still problem of searching on 
>> the full text of binary docs.
>>  
>> Other option is to perform 2 search runs (one for escaped and other for non 
>> escaped query) and merge the results.
>>  
>>  
>> HTH,
>> Jan
>> 
>> 
>>  
>> Best Regards,
>>  
>> Frank
>>  
>> 
>> ----------------------------------------------------------------
>> For list details see
>> http://www.magnolia-cms.com/home/community/mailing-lists.html
>> To unsubscribe, E-mail to: <[email protected]>
>> ----------------------------------------------------------------
>> 
>> 
>> 
>> 
>> ----------------------------------------------------------------
>> For list details see
>> http://www.magnolia-cms.com/home/community/mailing-lists.html
>> To unsubscribe, E-mail to: <[email protected]>
>> ----------------------------------------------------------------
>> 
>> 
>> ----------------------------------------------------------------
>> For list details see
>> http://www.magnolia-cms.com/home/community/mailing-lists.html
>> To unsubscribe, E-mail to: <[email protected]>
>> ----------------------------------------------------------------
> 
> 
> 
> 
> ----------------------------------------------------------------
> For list details see
> http://www.magnolia-cms.com/home/community/mailing-lists.html
> To unsubscribe, E-mail to: <[email protected]>
> ----------------------------------------------------------------



----------------------------------------------------------------
For list details see
http://www.magnolia-cms.com/home/community/mailing-lists.html
To unsubscribe, E-mail to: <[email protected]>
----------------------------------------------------------------

Reply via email to