Re: Improper display of Polish characters coded in UTF-8 on Polish OS

2011-05-18 Thread Jernej Simončič
On Wednesday, May 18, 2011, 7:53:39, RS wrote:

 As  far as I remember you can have parts of the page in one coding and
 the other part in other coding. Enforcing the coding from Header would
 mess up such messages - my second thought.

I'm talking about the MIME part header, not the global header. Each
MIME section can have it's own content-type header, so each part can
have it's own character set.

-- 
 Jernej Simončič  http://eternallybored.org/ 

[ The Bat! 5.0.12.2 on Windows 7 6.1.7601.Service Pack 1 ]

Much work, much food; little work, little food; no work, burial at sea.
   -- Cook's Law



 Current beta is 5.0.12.3 | 'Using TBBETA' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re: Improper display of Polish characters coded in UTF-8 on Polish OS

2011-05-18 Thread Thomas Fernandez
Hello Marek,

On Tue, 17 May 2011 21:23:19 +0200 GMT (18/May/11, 2:23 AM +0700 GMT),
Marek Mikus wrote:

 If  it's  improper  coding  matter  -  manual switch to Chinese coding
 and back to UTF-8 should not change anything, right?

 The problem is likely that HTML encoding and MIME part encoding
 differ. TB! picks the wrong one and if you switch it is forced to use
 the correct one.

MM TB uses META tag defined in HTML page, what is wrong with it? Yes, chars
MM are displayed wrong, but this is sender's client fault, if defines
MM different charset in HTML tag than is used for chars encoding.

Maybe what you wrote only sounds weird over here, so I apologize in
advance. Obviously, I don't understand the technology. But when you
switch away to another random encoding and then back to UTF-8,
suddenly the message is displayed correctly. So if displaying it
incorrectly is correct, then displaying it correctly must certainly be
wrong?

As a user I can say that what appears to be wrong with TB!'s
behaviour is that it displayed the message wrong in the first place,
that's what I see without knowing anything about encoding. Are you
certain that there is no email client in the world that can display
the message correctly *without* having to switch to another (random)
encoding and back? Is this standard behaviour according to RFC?

The statement it's the sender's clients fault that you cannot see
their message in TB! correctly does sound a bit weird to my ears, but
I may be misunderstanding you or the whole situation. To ease things,
can I create a keyboard shortcut to switch to Chinese and back, or
better yet, a filter so that makes it happen automatically?

MM It is possible force TB to use MIME header instead HTML tag - add a DWORD
MM registry variable HtmlCharSetPriority with value 1 under
MM HKEY_CURRENT_USER\Software\RIT\The Bat! and MIME charset will have
MM precedence over HTML header charset.

I am certain that you do not expect any user to mess with the
registry.

-- 

Cheers,
Thomas.

http://thomas.fernandez.hat-gar-keine-homepage.de/

Message reply created with The Bat! 4.2.44
under Windows XP 5.1 Build 2600 Service Pack 3



 Current beta is 5.0.12.3 | 'Using TBBETA' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re: Improper display of Polish characters coded in UTF-8 on Polish OS

2011-05-17 Thread RS
Hi Raymund,

 I  can  understand  that  they have problem with Chinese characters on
 Polish/English   OS   but   improper  displaying  of Polish characters
 coded in UTF-8 on Polish OS is a bit too much

 Please check if it is a HTML message that uses incorrect encoding
 information.

It's Text and HTML - part of the header bellow.

Content-Type: multipart/related;
type=text/html;
boundary=b1_3219bfe0432f612b608e63a94c72116c
X-Spam-Status: No, score=-3.4 required=6.0 tests=ALL_TRUSTED,AWL,BAYES_00,
EXTRA_MPART_TYPE,HTML_MESSAGE autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail.8a.pl

--b1_3219bfe0432f612b608e63a94c72116c
Content-Type: multipart/alternative;
boundary=b2_3219bfe0432f612b608e63a94c72116c

--b2_3219bfe0432f612b608e63a94c72116c
Content-Type: text/plain; charset = UTF-8
Content-Transfer-Encoding: 8bit



//** Comment
--b1 - is text message
--b2 - is HTML message
**//


It's  recognized  by  TB!  as UTF-8 but you need to manually switch to
other coding and back to UTF-8 to have it shown properly.

If  it's  improper  coding  matter  -  manual switch to Chinese coding
and back to UTF-8 should not change anything, right?

-- 
Best regards,
RS



 Current beta is 5.0.12.3 | 'Using TBBETA' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re: Improper display of Polish characters coded in UTF-8 on Polish OS

2011-05-17 Thread RS
Hi Raymund,

 I  can  understand  that  they have problem with Chinese characters on
 Polish/English   OS   but   improper  displaying  of Polish characters
 coded in UTF-8 on Polish OS is a bit too much

 Please check if it is a HTML message that uses incorrect encoding
 information.

If  it's  encoding  matter  it  would be another bug that TB! shows it
properly while switching coding system to something and back to UTF-8.

-- 
Best regards,
RS



 Current beta is 5.0.12.3 | 'Using TBBETA' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re: Improper display of Polish characters coded in UTF-8 on Polish OS

2011-05-17 Thread RS
Hi Raymund,

 If  it's  improper  coding  matter  -  manual switch to Chinese coding
 and back to UTF-8 should not change anything, right?

 The problem is likely that HTML encoding and MIME part encoding
 differ. TB! picks the wrong one and if you switch it is forced to use
 the correct one.

 What happens if you force TB! to hide the HTML part. Is the text part
 shown correct?

On Text mode Polish characters are shown properly.

But  there  is  other  problem  -  however I am not sure if it's not a
proper  behaviour  in  TB! Looks like this guys mixed nbsp; into Text
mode.

They use  nbsp;  in Text mode as a space between words to keep
them together, ex:

Hit  Dnia  to  nasza  oferta  specjalna,nbsp;ważna codziennienbsp;do
godziny24:00.nbsp;Dziękisubskrypcjinaszegonewslettera
będziesznbsp;zawsze...

And that is probably their mistake.

-- 
Best regards,
RS



 Current beta is 5.0.12.3 | 'Using TBBETA' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re: Improper display of Polish characters coded in UTF-8 on Polish OS

2011-05-17 Thread Jernej Simončič
On Tuesday, May 17, 2011, 21:23:19, Marek Mikus wrote:

 TB uses META tag defined in HTML page, what is wrong with it?

Shouldn't the charset defined in headers take precedence? At least in
HTTP, the charset in HTTP headers always overrides any meta tags.

-- 
 Jernej Simončič  http://eternallybored.org/ 

[ The Bat! 5.0.12.2 on Windows 7 6.1.7601.Service Pack 1 ]

If God had intended us to go around naked, He would have made us that way.
   -- Olum's Observation (and see Martha's Maxim and Farrow's Finding)



 Current beta is 5.0.12.3 | 'Using TBBETA' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re: Improper display of Polish characters coded in UTF-8 on Polish OS

2011-05-17 Thread RS
Dear Marek,

RS If   it's  improper  coding  matter  -  manual switch to Chinese
RS coding and back to UTF-8 should not change anything, right?

Raymund The  problem  is  likely  that  HTML  encoding and MIME part
Raymund encoding  differ.  TB! picks the wrong one and if you switch
Raymund it is forced to use the correct one.

Marek TB  uses  META tag defined in HTML page, what is wrong with it?
I am sorry, but there is something wrong with the display result.

If  TB!  uses  coding  in HTML's META for displaying and HEADER=UTF-8,
HTML_META=iso-8859-2   and  characters  are  CODED=UTF-8 manual switch
would  work  as  you  described  -  forced  to  ignore  HEADER but use
HTML_META.

But   that  forceing of usage of UTF-8 should be only for one message
not  for  all  in  the folder. After  manual  switch  on  one  message
from   the  sender,  all  messages   from   that  sender  in  the same
folder  are  shown  properly.  So,  after  a switch on one message TB!
ignores HEADER on all messages from the same sender?

In  the  same folder I have message coded in BIG5 and when I switch on
Polish message to UTF-8 the BIG5 message is messed up.
Content-transfer-encoding: base64
Content-Type: text/html;
charset=BIG5

Does  View/Charset  applies  to  a  whole  folder  instead of a single
message?


Whenyou   get through menu to Char Set on Chinese message it still
shows a BIG5  coding not UTF-8. After manual switch  to  something and
back to BIG5  I can see Chinese  characters.  But, after the switching
on  BIG5  message Polish sender's messages shows Chinese characters in
place  of  Polish  letters.  After  manual switch on Polish message to
UTF-8  Chinese  is  messed  up again (but still view on it is set up a
BIG5)..


AUTODETECT
Selecting  on  the  BIG5  message  autodetect makes a From, To, Sender
(that  one  sometimes  is  OK,  sometimes  messed up as well), Mailer,
Message ID totaly  unreadible.  When after Autodetect was selected and I 
manually
set a Chinese messsage to BIG5 it's shown properly but Polish characters
in  the  Polish  message  are  Chinese (on UTF-8 coding shown for that
message).

Any idea why switching is causing such behaviour?
Is it proper behaviour or rather misfunction?

Screens were uploaded.


Marek Yes,  chars  are  displayed  wrong, but this is sender's client
Marek fault,
Characters  are  coded  in  UTF-8  - checked with binary code editor ;
HTML_META  points  to iso-8859-2 but if TB! uses that info why Chinese
characters are shown in Polish message?

Marek   if  defines  different charset in HTML tag than is used
Marek for chars encoding.

You  right  here,  they  messed up a message a bit - and it's good, in
that  way  I was able to find out that at least for BIG5 Autodetect is
not working properly.

MESSED UP MESSAGE
a) header shows usage of UTF-8
b2_3219bfe0432f612b608e63a94c72116c
Content-Type: text/plain; charset = UTF-8
Content-Transfer-Encoding: 8bit

b) HTML body directs to iso-8859-2
meta content=text/html; charset=iso-8859-2 http-equiv=Content-Type


Marek It  is  possible force TB to use MIME header instead HTML tag -
Marek add  a DWORD registry variable HtmlCharSetPriority with value 1
Marek under  HKEY_CURRENT_USER\Software\RIT\The Bat! and MIME charset
Marek will have precedence over HTML header charset.

Thank you for that info.


-- 
Best regards,
RS



 Current beta is 5.0.12.3 | 'Using TBBETA' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re: Improper display of Polish characters coded in UTF-8 on Polish OS

2011-05-17 Thread RS
Hi Marek,

 Screens were uploaded.
Wasn't able to upload them - tried a few times.
Will try later.

SYSTEM OUTPUT below:


APPLICATION ERROR #FILE_MOVE_FAILED

Please  use  the  Back  button  in your web browser to return to the
previous page. There you can correct whatever problems were identified
in  this  error or select another action. You can also click an option
from the menu bar to go directly to a new section.


SYSTEM  WARNING:  move_uploaded_file()  [function.move-uploaded-file]:
Unable  to  move  '/tmp/phpyLJewb'  to
'files/tb/ccd67917f637eb2f95f2f086be3e6186'  


File names:
BIG5 - Autodetect.png
BIG5 - Forced After Autodetect.png
BIG5_After_Changing_Folders__All_Fine.png
Polish  message after BIG5 was switched to UTF8 and Polish message was
browsed and BIG5 was switched to BIG5 again.png

Last  one  might  be  too long - any constrains concerning a file name
while uploading?

-- 
Best regards,
RS



 Current beta is 5.0.12.3 | 'Using TBBETA' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re: Improper display of Polish characters coded in UTF-8 on Polish OS

2011-05-17 Thread RS
Hi Raymund,

 The problem is likely that HTML encoding and MIME part encoding
 differ. TB! picks the wrong one and if you switch it is forced to use
 the correct one.
 TB uses META tag defined in HTML page, what is wrong with it?

 No need to defend TB! here :-)
I  loved it when I stumbled into v4 - it was the reason why I got back
and purchased the license.

 Wrong was meant in the way that TB! didn't meet RS expectations not
 that the behaviour of TB! is wrong.
Unfortunately on many other fields :(
That  messed  up message coding is annoying but can live with that for
some time - it's  not  stopper  but  should not happened in commercial
application. What makes me worried is BIG5/UTF-8/Autodetect switchings
matter that makes some messages unredeable despite of the fact that in
theory they are marked properly.


The  truth  is  that  after  getting  folders disappeared in TB! while
importing  EML  from  Thunderbird,  sorting them, creating folders and
filters  I am affraid it can happen again and I would loose the access
to data or getting them back would take me a few days.

Apart  the  fact  that it took me two days (about 12 hours each) to do
above 

EML  import  took  so long because I have them sorted into folders and
subfolders  but  while  importing  there is no subfolders scan for EML
messages  (and  creating  relevant folders) so you need to do all work
manually  (creating  of  a few hundreds folders and import manually to
each of them).

 We had that discussion earlier as I mentioned and I would expect TB!
 to take the meta tag encoding as well.

Probably prior I got to TBBETA :(

-- 
Best regards,
RS



 Current beta is 5.0.12.3 | 'Using TBBETA' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re: Improper display of Polish characters coded in UTF-8 on Polish OS

2011-05-17 Thread RS
Hi Jernej,

 TB uses META tag defined in HTML page, what is wrong with it?

 Shouldn't the charset defined in headers take precedence? At least in
 HTTP, the charset in HTTP headers always overrides any meta tags.

Marek is right here that TB! behaviour as decoding HTML is proper.

As  far as I remember you can have parts of the page in one coding and
the other part in other coding. Enforcing the coding from Header would
mess up such messages - my second thought.

At  that  point I was wrong and TB!'s approach to use coding from HTML
body looks like to be proper.


What  makes  me worried is that switching part when Chinese characters
are  appearing  on  Polish  message, when BIG5 becomes unreadible
despite  the  fact  that  in  theory  BIG5 is selected, when selecting
manually on one message UTF-8 applies to all messages from that Polish
sender  but  at  the  same  time  after  it's  selected  BIG5  becomes
unreadible.

-- 
Best regards,
RS



 Current beta is 5.0.12.3 | 'Using TBBETA' information:
http://www.silverstones.com/thebat/TBUDLInfo.html