Hi! How about converting the entities in source documents to their respective unicode values and put the result in the search index? I think that would be the cleanest solution.
Greetings, Felix On Feb 19, 2011, at 2:08 PM, Jan Haderka wrote: > > On Feb 16, 2011, at 8:39 AM, frank rittinger wrote: > >> Hi Jan, >> >> wouldn’t your suggestion make it impossible to find text in uploaded >> documents, e.g. PDF etc.? > > yeah, that's what i meant by saying "here is still problem of searching on > the full text of binary docs." > >> >> I went for something similar then your other solution: Build an OR-query >> that searches for the original search terms (found in titles, documents, >> etc.) or the search term HTML-escaped; instead of two separate queries. It >> looks like this solution works. >> >> Best regards, >> >> Frank >> >> Von: [email protected] >> [mailto:[email protected]] Im Auftrag von Jan Haderka >> Gesendet: Dienstag, 15. Februar 2011 21:04 >> An: Magnolia User-List >> Betreff: Re: [magnolia-user] simple fulltext search and non ASCII characters >> in wysiwyg fields >> >> >> On Feb 11, 2011, at 9:19 PM, frank rittinger wrote: >> >> >> Hi list, >> >> I just realized, that the simple search cannot find non-ASCII characters >> (e.g. “Täst”) that were edited in a wysiwyg editor, e.g. TextImage >> paragraph. However searching for “Täst” finds the expected result. Is >> there an easy way to also find characters that were transformed to html >> entities by the fckedit? >> >> One possible solution would be to replace all non-ASCII characters in the >> SearchModel by their corresponding html entity. Is there a more straight >> forward way? >> >> I would say the simplest solution is to extend Edit control with the simple >> flag "escapeChars" which would then escape all html entities on save so the >> edit will be same as FckEdit content. >> The second step would be then perform same escaping on the search query >> prior the search. >> Since all the content will be escaped, you should get all the hits no matter >> whether created with edit or fckEdit. There is still problem of searching on >> the full text of binary docs. >> >> Other option is to perform 2 search runs (one for escaped and other for non >> escaped query) and merge the results. >> >> >> HTH, >> Jan >> >> >> >> Best Regards, >> >> Frank >> >> >> ---------------------------------------------------------------- >> For list details see >> http://www.magnolia-cms.com/home/community/mailing-lists.html >> To unsubscribe, E-mail to: <[email protected]> >> ---------------------------------------------------------------- >> >> >> >> >> ---------------------------------------------------------------- >> For list details see >> http://www.magnolia-cms.com/home/community/mailing-lists.html >> To unsubscribe, E-mail to: <[email protected]> >> ---------------------------------------------------------------- >> >> >> ---------------------------------------------------------------- >> For list details see >> http://www.magnolia-cms.com/home/community/mailing-lists.html >> To unsubscribe, E-mail to: <[email protected]> >> ---------------------------------------------------------------- > > > > > ---------------------------------------------------------------- > For list details see > http://www.magnolia-cms.com/home/community/mailing-lists.html > To unsubscribe, E-mail to: <[email protected]> > ---------------------------------------------------------------- ---------------------------------------------------------------- For list details see http://www.magnolia-cms.com/home/community/mailing-lists.html To unsubscribe, E-mail to: <[email protected]> ----------------------------------------------------------------
