Polish Web sites use Cp1250 (windows-1250) or iso8859-2 (or UTF-8 of course). Check if diacritics like these:
ęółąśćżń look all right in the above encodings and use appropriately. Dawid On Wed, Sep 16, 2009 at 4:47 PM, MilleBii <mille...@gmail.com> wrote: > same thing when there is > charset=ISO-8859-2 > > 2009/9/16 MilleBii <mille...@gmail.com> > >> Not sure where to look for explanations: >> >> I have a problem with some Polish pages which I can not index properly on >> the specific polish characters such as : >> Ł >> >> They are havin the following charset=windows-1252 >> >> Does the HTML parser convert them into their Unicode equivalent .... >> >> -- >> -MilleBii- >> > > > > -- > -MilleBii- >