Re: Improper display of Polish characters coded in UTF-8 on Polish OS
On Wednesday, May 18, 2011, 7:53:39, RS wrote: As far as I remember you can have parts of the page in one coding and the other part in other coding. Enforcing the coding from Header would mess up such messages - my second thought. I'm talking about the MIME part header, not the global header. Each MIME section can have it's own content-type header, so each part can have it's own character set. -- Jernej Simončič http://eternallybored.org/ [ The Bat! 5.0.12.2 on Windows 7 6.1.7601.Service Pack 1 ] Much work, much food; little work, little food; no work, burial at sea. -- Cook's Law Current beta is 5.0.12.3 | 'Using TBBETA' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re: Improper display of Polish characters coded in UTF-8 on Polish OS
Hello Marek, On Tue, 17 May 2011 21:23:19 +0200 GMT (18/May/11, 2:23 AM +0700 GMT), Marek Mikus wrote: If it's improper coding matter - manual switch to Chinese coding and back to UTF-8 should not change anything, right? The problem is likely that HTML encoding and MIME part encoding differ. TB! picks the wrong one and if you switch it is forced to use the correct one. MM TB uses META tag defined in HTML page, what is wrong with it? Yes, chars MM are displayed wrong, but this is sender's client fault, if defines MM different charset in HTML tag than is used for chars encoding. Maybe what you wrote only sounds weird over here, so I apologize in advance. Obviously, I don't understand the technology. But when you switch away to another random encoding and then back to UTF-8, suddenly the message is displayed correctly. So if displaying it incorrectly is correct, then displaying it correctly must certainly be wrong? As a user I can say that what appears to be wrong with TB!'s behaviour is that it displayed the message wrong in the first place, that's what I see without knowing anything about encoding. Are you certain that there is no email client in the world that can display the message correctly *without* having to switch to another (random) encoding and back? Is this standard behaviour according to RFC? The statement it's the sender's clients fault that you cannot see their message in TB! correctly does sound a bit weird to my ears, but I may be misunderstanding you or the whole situation. To ease things, can I create a keyboard shortcut to switch to Chinese and back, or better yet, a filter so that makes it happen automatically? MM It is possible force TB to use MIME header instead HTML tag - add a DWORD MM registry variable HtmlCharSetPriority with value 1 under MM HKEY_CURRENT_USER\Software\RIT\The Bat! and MIME charset will have MM precedence over HTML header charset. I am certain that you do not expect any user to mess with the registry. -- Cheers, Thomas. http://thomas.fernandez.hat-gar-keine-homepage.de/ Message reply created with The Bat! 4.2.44 under Windows XP 5.1 Build 2600 Service Pack 3 Current beta is 5.0.12.3 | 'Using TBBETA' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re: Improper display of Polish characters coded in UTF-8 on Polish OS
Hi Raymund, I can understand that they have problem with Chinese characters on Polish/English OS but improper displaying of Polish characters coded in UTF-8 on Polish OS is a bit too much Please check if it is a HTML message that uses incorrect encoding information. It's Text and HTML - part of the header bellow. Content-Type: multipart/related; type=text/html; boundary=b1_3219bfe0432f612b608e63a94c72116c X-Spam-Status: No, score=-3.4 required=6.0 tests=ALL_TRUSTED,AWL,BAYES_00, EXTRA_MPART_TYPE,HTML_MESSAGE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail.8a.pl --b1_3219bfe0432f612b608e63a94c72116c Content-Type: multipart/alternative; boundary=b2_3219bfe0432f612b608e63a94c72116c --b2_3219bfe0432f612b608e63a94c72116c Content-Type: text/plain; charset = UTF-8 Content-Transfer-Encoding: 8bit //** Comment --b1 - is text message --b2 - is HTML message **// It's recognized by TB! as UTF-8 but you need to manually switch to other coding and back to UTF-8 to have it shown properly. If it's improper coding matter - manual switch to Chinese coding and back to UTF-8 should not change anything, right? -- Best regards, RS Current beta is 5.0.12.3 | 'Using TBBETA' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re: Improper display of Polish characters coded in UTF-8 on Polish OS
Hi Raymund, I can understand that they have problem with Chinese characters on Polish/English OS but improper displaying of Polish characters coded in UTF-8 on Polish OS is a bit too much Please check if it is a HTML message that uses incorrect encoding information. If it's encoding matter it would be another bug that TB! shows it properly while switching coding system to something and back to UTF-8. -- Best regards, RS Current beta is 5.0.12.3 | 'Using TBBETA' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re: Improper display of Polish characters coded in UTF-8 on Polish OS
Hi Raymund, If it's improper coding matter - manual switch to Chinese coding and back to UTF-8 should not change anything, right? The problem is likely that HTML encoding and MIME part encoding differ. TB! picks the wrong one and if you switch it is forced to use the correct one. What happens if you force TB! to hide the HTML part. Is the text part shown correct? On Text mode Polish characters are shown properly. But there is other problem - however I am not sure if it's not a proper behaviour in TB! Looks like this guys mixed nbsp; into Text mode. They use nbsp; in Text mode as a space between words to keep them together, ex: Hit Dnia to nasza oferta specjalna,nbsp;ważna codziennienbsp;do godziny24:00.nbsp;Dziękisubskrypcjinaszegonewslettera będziesznbsp;zawsze... And that is probably their mistake. -- Best regards, RS Current beta is 5.0.12.3 | 'Using TBBETA' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re: Improper display of Polish characters coded in UTF-8 on Polish OS
On Tuesday, May 17, 2011, 21:23:19, Marek Mikus wrote: TB uses META tag defined in HTML page, what is wrong with it? Shouldn't the charset defined in headers take precedence? At least in HTTP, the charset in HTTP headers always overrides any meta tags. -- Jernej Simončič http://eternallybored.org/ [ The Bat! 5.0.12.2 on Windows 7 6.1.7601.Service Pack 1 ] If God had intended us to go around naked, He would have made us that way. -- Olum's Observation (and see Martha's Maxim and Farrow's Finding) Current beta is 5.0.12.3 | 'Using TBBETA' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re: Improper display of Polish characters coded in UTF-8 on Polish OS
Dear Marek, RS If it's improper coding matter - manual switch to Chinese RS coding and back to UTF-8 should not change anything, right? Raymund The problem is likely that HTML encoding and MIME part Raymund encoding differ. TB! picks the wrong one and if you switch Raymund it is forced to use the correct one. Marek TB uses META tag defined in HTML page, what is wrong with it? I am sorry, but there is something wrong with the display result. If TB! uses coding in HTML's META for displaying and HEADER=UTF-8, HTML_META=iso-8859-2 and characters are CODED=UTF-8 manual switch would work as you described - forced to ignore HEADER but use HTML_META. But that forceing of usage of UTF-8 should be only for one message not for all in the folder. After manual switch on one message from the sender, all messages from that sender in the same folder are shown properly. So, after a switch on one message TB! ignores HEADER on all messages from the same sender? In the same folder I have message coded in BIG5 and when I switch on Polish message to UTF-8 the BIG5 message is messed up. Content-transfer-encoding: base64 Content-Type: text/html; charset=BIG5 Does View/Charset applies to a whole folder instead of a single message? Whenyou get through menu to Char Set on Chinese message it still shows a BIG5 coding not UTF-8. After manual switch to something and back to BIG5 I can see Chinese characters. But, after the switching on BIG5 message Polish sender's messages shows Chinese characters in place of Polish letters. After manual switch on Polish message to UTF-8 Chinese is messed up again (but still view on it is set up a BIG5).. AUTODETECT Selecting on the BIG5 message autodetect makes a From, To, Sender (that one sometimes is OK, sometimes messed up as well), Mailer, Message ID totaly unreadible. When after Autodetect was selected and I manually set a Chinese messsage to BIG5 it's shown properly but Polish characters in the Polish message are Chinese (on UTF-8 coding shown for that message). Any idea why switching is causing such behaviour? Is it proper behaviour or rather misfunction? Screens were uploaded. Marek Yes, chars are displayed wrong, but this is sender's client Marek fault, Characters are coded in UTF-8 - checked with binary code editor ; HTML_META points to iso-8859-2 but if TB! uses that info why Chinese characters are shown in Polish message? Marek if defines different charset in HTML tag than is used Marek for chars encoding. You right here, they messed up a message a bit - and it's good, in that way I was able to find out that at least for BIG5 Autodetect is not working properly. MESSED UP MESSAGE a) header shows usage of UTF-8 b2_3219bfe0432f612b608e63a94c72116c Content-Type: text/plain; charset = UTF-8 Content-Transfer-Encoding: 8bit b) HTML body directs to iso-8859-2 meta content=text/html; charset=iso-8859-2 http-equiv=Content-Type Marek It is possible force TB to use MIME header instead HTML tag - Marek add a DWORD registry variable HtmlCharSetPriority with value 1 Marek under HKEY_CURRENT_USER\Software\RIT\The Bat! and MIME charset Marek will have precedence over HTML header charset. Thank you for that info. -- Best regards, RS Current beta is 5.0.12.3 | 'Using TBBETA' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re: Improper display of Polish characters coded in UTF-8 on Polish OS
Hi Marek, Screens were uploaded. Wasn't able to upload them - tried a few times. Will try later. SYSTEM OUTPUT below: APPLICATION ERROR #FILE_MOVE_FAILED Please use the Back button in your web browser to return to the previous page. There you can correct whatever problems were identified in this error or select another action. You can also click an option from the menu bar to go directly to a new section. SYSTEM WARNING: move_uploaded_file() [function.move-uploaded-file]: Unable to move '/tmp/phpyLJewb' to 'files/tb/ccd67917f637eb2f95f2f086be3e6186' File names: BIG5 - Autodetect.png BIG5 - Forced After Autodetect.png BIG5_After_Changing_Folders__All_Fine.png Polish message after BIG5 was switched to UTF8 and Polish message was browsed and BIG5 was switched to BIG5 again.png Last one might be too long - any constrains concerning a file name while uploading? -- Best regards, RS Current beta is 5.0.12.3 | 'Using TBBETA' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re: Improper display of Polish characters coded in UTF-8 on Polish OS
Hi Raymund, The problem is likely that HTML encoding and MIME part encoding differ. TB! picks the wrong one and if you switch it is forced to use the correct one. TB uses META tag defined in HTML page, what is wrong with it? No need to defend TB! here :-) I loved it when I stumbled into v4 - it was the reason why I got back and purchased the license. Wrong was meant in the way that TB! didn't meet RS expectations not that the behaviour of TB! is wrong. Unfortunately on many other fields :( That messed up message coding is annoying but can live with that for some time - it's not stopper but should not happened in commercial application. What makes me worried is BIG5/UTF-8/Autodetect switchings matter that makes some messages unredeable despite of the fact that in theory they are marked properly. The truth is that after getting folders disappeared in TB! while importing EML from Thunderbird, sorting them, creating folders and filters I am affraid it can happen again and I would loose the access to data or getting them back would take me a few days. Apart the fact that it took me two days (about 12 hours each) to do above EML import took so long because I have them sorted into folders and subfolders but while importing there is no subfolders scan for EML messages (and creating relevant folders) so you need to do all work manually (creating of a few hundreds folders and import manually to each of them). We had that discussion earlier as I mentioned and I would expect TB! to take the meta tag encoding as well. Probably prior I got to TBBETA :( -- Best regards, RS Current beta is 5.0.12.3 | 'Using TBBETA' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re: Improper display of Polish characters coded in UTF-8 on Polish OS
Hi Jernej, TB uses META tag defined in HTML page, what is wrong with it? Shouldn't the charset defined in headers take precedence? At least in HTTP, the charset in HTTP headers always overrides any meta tags. Marek is right here that TB! behaviour as decoding HTML is proper. As far as I remember you can have parts of the page in one coding and the other part in other coding. Enforcing the coding from Header would mess up such messages - my second thought. At that point I was wrong and TB!'s approach to use coding from HTML body looks like to be proper. What makes me worried is that switching part when Chinese characters are appearing on Polish message, when BIG5 becomes unreadible despite the fact that in theory BIG5 is selected, when selecting manually on one message UTF-8 applies to all messages from that Polish sender but at the same time after it's selected BIG5 becomes unreadible. -- Best regards, RS Current beta is 5.0.12.3 | 'Using TBBETA' information: http://www.silverstones.com/thebat/TBUDLInfo.html