charset table and how to use them

2004-06-24 Thread Shawn Walker
How can I utilize c-client charset table to convert characters?
I know that utf8_text() can convert the characters, but I'm having mixed  
results.

Thanks,
Shawn
--
--
For information about this mailing list, and its archives, see: 
http://www.washington.edu/imap/c-client-list.html
--


Re: charset table and how to use them

2004-06-24 Thread Mark Crispin
On Thu, 24 Jun 2004, Shawn Walker wrote:
How can I utilize c-client charset table to convert characters?
I know that utf8_text() can convert the characters, but I'm having mixed 
results.
What, exactly, are you trying to do?
utf8_text() is the routine to convert from arbitrary character sets into 
UTF-8 (normalized with pre-composed characters).  The new utf8_cstext() 
routine will convert normalized pre-composed UTF-8 into most character 
sets (as best it can; Greek text doesn't convert well into Chinese...).

To do conversion from one non-UTF-8 character set into another non-UTF-8 
character set, you can use the new utf8_cstocstext() routine (I forget if 
this made it into imap-2004, but it's in imap-2004a).  You can do things 
faster and with less memory if you set up the conversion tables yourself 
using utf8_rmap() -- Pine does this; look at the routines in strings.c and 
filter.c in the Pine sources.

-- Mark --
http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.


Re: checking for new mail in all mailboxes

2004-06-24 Thread Mark Crispin
On Thu, 24 Jun 2004, David Feldman wrote:
How does one (at both the IMAP command and c-client level) check for new mail 
efficiently in all a user's mailboxes at once?
The short answer is: you can't.
The longer answer is:
Identify a set of mailboxes which merit further probing, and focus your 
checking on them.

In a strictly check all mailboxes environment, do a LIST and note which 
mailboxes come back with \Marked status (if you're paranoid, then choose 
the mailboxes which don't have \Unmarked status).  Then do a STATUS on 
each of these to check them further, or just SELECT them if the user wants 
them opened.

Alternatively, have a discrimination between incoming mailboxes and 
mailboxes which are strictly archive.  Don't even consider the archive 
mailboxes, which for most users greatly overwhelm the number of incoming 
mailboxes.  If you are a reasonable number of incoming mailboxes, then 
just have all of these mailboxes SELECTed in separate IMAP sessions; this 
is the most efficient, best real-time, and least-costly way to monitor a 
set of mailboxes.

Put another way; 5 IMAP sessions monitoring 5 mailboxes is less costly 
(often *MUCH* less costly) than repeatedly probing those 5 mailboxes in 
one IMAP session.  Sessions are cheap, especially if you use the IDLE 
command.  Polls are not cheap, especially with mail stores that oblige the 
server has to parse the enter mailbox to satisfy a STATUS.

-- Mark --
http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.


Re: charset table and how to use them

2004-06-24 Thread Shawn Walker
On Thu, 24 Jun 2004 11:42:26 -0700 (PDT), Mark Crispin  
[EMAIL PROTECTED] wrote:

On Thu, 24 Jun 2004, Shawn Walker wrote:
To do conversion from one non-UTF-8 character set into another  
non-UTF-8 character set, you can use the new utf8_cstocstext() routine  
(I forget if this made it into imap-2004, but it's in imap-2004a).   
You can do things faster and with less memory if you set up the  
conversion tables yourself using utf8_rmap() -- Pine does this; look  
at the routines in strings.c and filter.c in the Pine sources.

Basically convert ISO-8859-1, UTF-8, ISO-8859-15, etc characters to  
whatever I need in order to display the characters.
Unless you are writing a text-based client for UNIX, you should convert  
everything into UTF-8 and use exclusively Unicode for display.  Even if  
you are writing a text-based client for UNIX, you should still consider  
using Unicode (UTF-8 is just a means of representing Unicode) as newer  
versions of UNIX now support UTF-8.

The only purpose for any other character set is to accept data in the  
other character set in incoming mail and files (and possibly from the  
user's keyboard -- although Unicode is preferred here too), and if  
necessary to sent mail in a non-Unicode character set (although this is  
doomed to deprecation).

Put another way, most programs should only need utf8_text() and  
utf8_cstext().

Or, if you feel that you need to be able to convert ISO-8859-15 to  
KOI8-R or ISO-2022-JP or BIG5, you are probably doing something wrong.

The program isn't running on unix.  It's running on Windows with Outlook  
(I know, bear with me. ;)

I have a string Iñtërnâtiônàlizætiøn that I need to encode before  
putting it in the body contents of BODY.  I don't have utf8_cstocstext(),  
but would that function do what I need to do?  I tried utf8_cstext() but,  
it didn't do anything (I passed UTF-8 for the charset).

Thanks,
Shawn