Re: The lib email parse problem...

2006-08-30 Thread Tim Roberts
 [EMAIL PROTECTED] wrote:

i know how to use email module lib.

the question is about how to handle the rfc 1521 mime
mulitpart/alternitave part .

i know emai can handle mulitpart , but the subpart  alternative is
special .

No, it's not.  A multipart/alternative section is constructed exactly the
same as any other multipart section.  It just so happens that it will have
exactly two subsections, one text/plain and one text/html.
-- 
- Tim Roberts, [EMAIL PROTECTED]
  Providenza  Boekelheide, Inc.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: The lib email parse problem...

2006-08-30 Thread John Machin

Tim Roberts wrote:
  [EMAIL PROTECTED] wrote:

 i know how to use email module lib.
 
 the question is about how to handle the rfc 1521 mime
 mulitpart/alternitave part .
 
 i know emai can handle mulitpart , but the subpart  alternative is
 special .

 No, it's not.  A multipart/alternative section is constructed exactly the
 same as any other multipart section.  It just so happens that it will have
 exactly two subsections, one text/plain and one text/html.

I was under the impression that it was a little more general than that
... see e.g. http://www.freesoft.org/CIE/RFC/1521/18.htm

My guess is that the OP meant special in the sense that the reader
needs to choose one subpart, instead of processing all subparts.

Cheers,
John




 -- 
 - Tim Roberts, [EMAIL PROTECTED]
   Providenza  Boekelheide, Inc.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: The lib email parse problem...

2006-08-30 Thread 叮叮当当
yes, the special is i must choose exactly one section to destruct,
instead of processing all subparts.


John Machin 写道:

 Tim Roberts wrote:
   [EMAIL PROTECTED] wrote:
 
  i know how to use email module lib.
  
  the question is about how to handle the rfc 1521 mime
  mulitpart/alternitave part .
  
  i know emai can handle mulitpart , but the subpart  alternative is
  special .
 
  No, it's not.  A multipart/alternative section is constructed exactly the
  same as any other multipart section.  It just so happens that it will have
  exactly two subsections, one text/plain and one text/html.

 I was under the impression that it was a little more general than that
 ... see e.g. http://www.freesoft.org/CIE/RFC/1521/18.htm

 My guess is that the OP meant special in the sense that the reader
 needs to choose one subpart, instead of processing all subparts.

 Cheers,
 John




  --
  - Tim Roberts, [EMAIL PROTECTED]
Providenza  Boekelheide, Inc.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The lib email parse problem...

2006-08-30 Thread John Machin
On 30/08/2006 4:44 PM, 叮叮当当 wrote:
 yes, the special is i must choose exactly one section to destruct,
 instead of processing all subparts.

So have you tried to use the example I posted yesterday? Do you still 
have any problems? Note: it is generally a good idea to post a message 
when you have overcome a problem -- that lets would-be helpers know that 
they are off the case :-)

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The lib email parse problem...

2006-08-30 Thread neoedmund
myself wrote a multipart parser in java(i customise it because i need
get information of upload progress). and i think it's also easy to
implement in python, i've not have time done it, or i'll post it.
but if you're no other special needs, just use email lib, it's quick to
program and if you really not need some part, just drop it.
there's anything wrong with email lib?

叮叮当当 wrote:
 yes, the special is i must choose exactly one section to destruct,
 instead of processing all subparts.


 John Machin 写道:

  Tim Roberts wrote:
    [EMAIL PROTECTED] wrote:
  
   i know how to use email module lib.
   
   the question is about how to handle the rfc 1521 mime
   mulitpart/alternitave part .
   
   i know emai can handle mulitpart , but the subpart  alternative is
   special .
  
   No, it's not.  A multipart/alternative section is constructed exactly the
   same as any other multipart section.  It just so happens that it will have
   exactly two subsections, one text/plain and one text/html.
 
  I was under the impression that it was a little more general than that
  ... see e.g. http://www.freesoft.org/CIE/RFC/1521/18.htm
 
  My guess is that the OP meant special in the sense that the reader
  needs to choose one subpart, instead of processing all subparts.
 
  Cheers,
  John
 
 
 
 
   --
   - Tim Roberts, [EMAIL PROTECTED]
 Providenza  Boekelheide, Inc.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The lib email parse problem...

2006-08-30 Thread 叮叮当当
thanks.

i have use a temp method to overcome it .

i still think the email lib should give the boundary border to parse
mail.

code is as following:

def parse_mail_content(self, mail):
content =''
alter =False
subty =''
html =''
plain =''
for part in mail.walk():
if part.is_multipart():
if part.get_content_subtype() =='alternative':
alter =True
else:
alter =False
continue
if part.get_content_maintype() =='text':
if part.get_filename():
continue
ty =part.get_content_subtype()
ch =part.get_content_charset()
if alter and ty =='plain':
subty ='plain'
if ch:
plain =unicode(part.get_payload(decode =
True),ch).encode('utf-8')
else:
plain =part.get_payload(decode =
True).decode('gb2312').encode('utf-8')
elif alter and ty =='html':
subty ='html'
if ch:
html =unicode(part.get_payload(decode =
True),ch).encode('utf-8')
else:
html =part.get_payload(decode =
True).decode('gb2312').encode('utf-8')
elif not alter:
if subty =='html':
content +=html
elif subty =='plain':
content +=plain
alter =False
subty =''
if ch:
content +=unicode(part.get_payload(decode =
True),ch).encode('utf-8')
else:
content +=part.get_payload(decode =
True).decode('gb2312').encode('utf-8')
elif alter:
if subty =='html':
content +=html
elif subty =='plain':
content +=plain
alter =False
subty =''
if alter:
if subty =='html':
content +=html
elif subty =='plain':
content +=plain
return content

thanks very much.

John Machin wrote:
 On 30/08/2006 4:44 PM, 叮叮当当 wrote:
  yes, the special is i must choose exactly one section to destruct,
  instead of processing all subparts.

 So have you tried to use the example I posted yesterday? Do you still
 have any problems? Note: it is generally a good idea to post a message
 when you have overcome a problem -- that lets would-be helpers know that
 they are off the case :-)
 
 Cheers,
 John

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The lib email parse problem...

2006-08-30 Thread Fredrik Lundh
 wrote:

 i have use a temp method to overcome it .

 i still think the email lib should give the boundary border to parse
 mail.

the email lib you're using is a PARSER, and it's already PARSING the
mail for you.

(if you have trouble structuring your program when someone else is doing
the parsing for you, what makes you think it would be easier if you had to
do the parsing yourself as well ?)

wrt. your temp method, I think you'll find that a recursive solution would
be a lot easier to get right without having to resort to code duplication like
in your example; I think John Machin posted an example earlier in this
thread.

/F



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: The lib email parse problem...

2006-08-29 Thread Fredrik Lundh
 wrote:

 when a email body consist with multipart/alternative,  i must know when
 the boundary ends to parse it,

or use a library that understands multipart messages.

 but the email lib have not provide some function to indicate the
 boundary end, how to solve it ?

http://docs.python.org/lib/module-email.Parser.html

/F 



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: The lib email parse problem...

2006-08-29 Thread John Machin
叮叮当当 wrote:
 hi, all

 when a email body consist with multipart/alternative,  i must know when
 the boundary ends to parse it,

 but the email lib have not provide some function to indicate the
 boundary end, how to solve it ?

By reading the manual.
http://docs.python.org/lib/module-email.Message.html

You don't need to concern yourself with boundaries -- a high-level
parser is provided.

Here's a simple example:

This script:

msg_text = 
[snip -- message is some plain text plus an attached file]

import email
pmsg = email.message_from_string(msg_text)
for part in pmsg.walk():
print part.get_content_type(), part.get_filename(NoFileName)

produced this output:

multipart/mixed NoFileName
text/plain NoFileName
application/octet-stream Extract.py

For a more comprehensive example, see
http://docs.python.org/lib/node597.html

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The lib email parse problem...

2006-08-29 Thread 叮叮当当
this is not enough.

when a part is mulitpart/alternative, i must find out which sub part i
need, not all the subparts. so i must know when the alternative is
ended.


John Machin 写道:

 叮叮当当 wrote:
  hi, all
 
  when a email body consist with multipart/alternative,  i must know when
  the boundary ends to parse it,
 
  but the email lib have not provide some function to indicate the
  boundary end, how to solve it ?

 By reading the manual.
 http://docs.python.org/lib/module-email.Message.html

 You don't need to concern yourself with boundaries -- a high-level
 parser is provided.

 Here's a simple example:

 This script:

 msg_text = 
 [snip -- message is some plain text plus an attached file]
 
 import email
 pmsg = email.message_from_string(msg_text)
 for part in pmsg.walk():
 print part.get_content_type(), part.get_filename(NoFileName)

 produced this output:

 multipart/mixed NoFileName
 text/plain NoFileName
 application/octet-stream Extract.py

 For a more comprehensive example, see
 http://docs.python.org/lib/node597.html
 
 HTH,
 John

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The lib email parse problem...

2006-08-29 Thread Max M
叮叮当当 wrote:
 this is not enough.
 
 when a part is mulitpart/alternative, i must find out which sub part i
 need, not all the subparts. so i must know when the alternative is
 ended.


Have you tried the email module at all?


-- 

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science

Phone:  +45 66 11 84 94
Mobile: +45 29 93 42 96
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The lib email parse problem...

2006-08-29 Thread 叮叮当当
supose a email part like this:

Content-Type: Multipart/Alternative;
boundary=Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm


--Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm
Content-Type: text/plain; charset=gb2312
Content-Transfer-Encoding: 7bit

   abcd.
--Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm
Content-Type: text/html; charset=gb2312
Content-Transfer-Encoding: quoted-printable

.
--Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm--

the plain text is abcd, and the alternative content type is text/html,
i should prefer explain the html content, and i must not explaint the
two part ,so i want to get the boundary end.

thanks all.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: The lib email parse problem...

2006-08-29 Thread 叮叮当当
i just use email module lib.

Max M 写道:

 叮叮当当 wrote:
  this is not enough.
 
  when a part is mulitpart/alternative, i must find out which sub part i
  need, not all the subparts. so i must know when the alternative is
  ended.


 Have you tried the email module at all?


 --

 hilsen/regards Max M, Denmark

 http://www.mxm.dk/
 IT's Mad Science
 
 Phone:  +45 66 11 84 94
 Mobile: +45 29 93 42 96

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The lib email parse problem...

2006-08-29 Thread Fredrik Lundh
 wrote:

 the plain text is abcd, and the alternative content type is text/html,
 i should prefer explain the html content, and i must not explaint the
 two part ,so i want to get the boundary end.

so use the email module:

import email

message_text = ...

message = email.message_from_string(message_text)

for part in message.walk():
if part.get_content_type() == text/html:
print html is, repr(part.get_payload())

(the message instances either contains a payload or sequence of submessages;
use message.is_multipart() to see if it's a sequence or not.  the walk() method
used in this example loops over all submessages, in message order).

/F 



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: The lib email parse problem...

2006-08-29 Thread Steve Holden
叮叮当当 wrote:
 supose a email part like this:
 
 Content-Type: Multipart/Alternative;
 boundary=Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm
 
 
 --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm
 Content-Type: text/plain; charset=gb2312
 Content-Transfer-Encoding: 7bit
 
abcd.
 --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm
 Content-Type: text/html; charset=gb2312
 Content-Transfer-Encoding: quoted-printable
 
 ..
 --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm--
 
 the plain text is abcd, and the alternative content type is text/html,
 i should prefer explain the html content, and i must not explaint the
 two part ,so i want to get the boundary end.
 
 thanks all.
 
In other words, you *haven't* tried the email module.

email.Parser can cope with arbitrarily complex message structures, 
including oddities like attachments which are themselves email messages 
containing their own attachments.

Read the documentation and look for sample code, then get back to the 
list with questions about how to make email do what you want it to.

Please don't ask us to re-invent existing libraries. that's why the 
libraries are there.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The lib email parse problem...

2006-08-29 Thread 叮叮当当
this is just a temp solution for the simplest email format as my
example, and i cannot always only show the html part.

but in fact , there are many more difficult mail format

btw, i know how to use walk(), and the question is not this.

my code is as the following:

def mail_content(mail):
content =''
for part in mail.walk():
if part.is_multipart():
continue

ch =part.get_content_charset()
if ch:
content +=unicode(part.get_payload(decode =
True),ch).encode('utf-8')
else:
content +=part.get_payload(decode =
True).decode('gb2312').encode('utf-8')
return content

Fredrik Lundh 写道:

  wrote:

  the plain text is abcd, and the alternative content type is text/html,
  i should prefer explain the html content, and i must not explaint the
  two part ,so i want to get the boundary end.

 so use the email module:

 import email

 message_text = ...

 message = email.message_from_string(message_text)

 for part in message.walk():
 if part.get_content_type() == text/html:
 print html is, repr(part.get_payload())

 (the message instances either contains a payload or sequence of submessages;
 use message.is_multipart() to see if it's a sequence or not.  the walk() 
 method
 used in this example loops over all submessages, in message order).
 
 /F

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The lib email parse problem...

2006-08-29 Thread 叮叮当当
i know how to use email module lib.

the question is about how to handle the rfc 1521 mime
mulitpart/alternitave part .

i know emai can handle mulitpart , but the subpart  alternative is
special .



Steve Holden 写道:

 叮叮当当 wrote:
  supose a email part like this:
 
  Content-Type: Multipart/Alternative;
  boundary=Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm
 
 
  --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm
  Content-Type: text/plain; charset=gb2312
  Content-Transfer-Encoding: 7bit
 
 abcd.
  --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm
  Content-Type: text/html; charset=gb2312
  Content-Transfer-Encoding: quoted-printable
 
  ..
  --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm--
 
  the plain text is abcd, and the alternative content type is text/html,
  i should prefer explain the html content, and i must not explaint the
  two part ,so i want to get the boundary end.
 
  thanks all.
 
 In other words, you *haven't* tried the email module.

 email.Parser can cope with arbitrarily complex message structures,
 including oddities like attachments which are themselves email messages
 containing their own attachments.

 Read the documentation and look for sample code, then get back to the
 list with questions about how to make email do what you want it to.

 Please don't ask us to re-invent existing libraries. that's why the
 libraries are there.

 regards
   Steve
 --
 Steve Holden   +44 150 684 7255  +1 800 494 3119
 Holden Web LLC/Ltd  http://www.holdenweb.com
 Skype: holdenweb   http://holdenweb.blogspot.com
 Recent Ramblings http://del.icio.us/steve.holden

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The lib email parse problem...

2006-08-29 Thread Fredrik Lundh
 wrote:

 btw, i know how to use walk(), and the question is not this.

so what is the question?

/F 



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: The lib email parse problem...

2006-08-29 Thread John Machin
叮叮当当 wrote:
 this is not enough.

 when a part is mulitpart/alternative, i must find out which sub part i
 need, not all the subparts. so i must know when the alternative is
 ended.


So you'll have to write your own tree-walker. It would seem that
is_multipart(), get_content_type() and get_payload() are the important
methods.

Here's a quickly lashed-up example:

def choose_one(part, html_ok=False):
last = None
for subpart in part.get_payload():
if html_ok or html not in subpart.get_content_type():
last = subpart
return last

def traverse(part, html_ok=False):
mp = part.is_multipart()
ty = part.get_content_type()
print multi:%r type:%r file:%r % (mp, ty,
part.get_filename(NoFileName))
if mp:
if ty == multipart/alternative:
chosen = choose_one(part, html_ok=html_ok)
traverse(chosen, html_ok=html_ok)
else:
for subpart in part.get_payload():
traverse(subpart, html_ok=html_ok)

import email
pmsg = email.message_from_string(msg_text)
for toggle in (True, False):
print --- html_ok is %r --- % toggle
traverse(pmsg, html_ok=toggle)

With a suitable message, this produced:

--- html_ok is True ---
multi:True type:'multipart/alternative' file:'NoFileName'
multi:False type:'text/html' file:'NoFileName'
--- html_ok is False ---
multi:True type:'multipart/alternative' file:'NoFileName'
multi:False type:'text/plain' file:'NoFileName'

-- 
http://mail.python.org/mailman/listinfo/python-list