Hey Vjeran,
I think this was a bug in an earlier version of Tika. I just tried
your example with Tika 1.10 (the latest) and the main file got
correctly detected as message/rfc822 with its subparts getting
detected as text/plain, text/html, and image/jpeg -- the correct
behavior!

Sergey

On Mon, Oct 12, 2015 at 12:53 AM, Vjeran Marcinko
<[email protected]> wrote:
> Hi,
>
> I took 2 .eml files from my Thunderbird, one with text/plain content, and
> one with text/html, both with few picture attachment, and first one got
> parsed via RFC822Parser whereas second one didnt get detected as
> message/rfc822 and got parsed by HTMLParser.
>
> Any suggestion how to correct this so HTML mail gets detected as
> message/rfc822?
>
> Here is a message source, and I can see there is multipart/alternative in
> play here sicne Thunderbird includes both type of content (probably for
> cases where mail client cannot display HTML):
>
> From - Sun Oct 11 17:23:18 2015
> X-Mozilla-Status: 0001
> X-Mozilla-Status2: 00000000
> X-Mozilla-Keys:
> FCC: mailbox://[email protected]/Sent
> X-Identity-Key: id1
> X-Account-Key: account1
> To: [email protected]
> From: Vjeran Marcinko <[email protected]>
> Subject: I am serious!
> Message-ID: <[email protected]>
> Date: Sun, 11 Oct 2015 17:23:17 +0200
> X-Mozilla-Draft-Info: internal/draft; vcard=0; receipt=0; DSN=0; uuencode=0;
>  attachmentreminder=0
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
>  Thunderbird/38.3.0
> MIME-Version: 1.0
> Content-Type: multipart/mixed;
>  boundary="------------040507030005030300070403"
>
> This is a multi-part message in MIME format.
> --------------040507030005030300070403
> Content-Type: multipart/alternative;
>  boundary="------------040402060408060209060509"
>
>
> --------------040402060408060209060509
> Content-Type: text/plain; charset=utf-8; format=flowed
> Content-Transfer-Encoding: 7bit
>
> This is *VERY IMPORTANT* text.
>
> I attached the pic of birthday cake, so you tell me if you like it.
>
> Also, the pic of my town is there also.
>
> Bye,
> Steve
>
>
> --------------040402060408060209060509
> Content-Type: text/html; charset=utf-8
> Content-Transfer-Encoding: 7bit
>
> <html>
>   <head>
>
>     <meta http-equiv="content-type" content="text/html; charset=utf-8">
>     <title>I am serious!</title>
>   </head>
>   <body text="#000000" bgcolor="#FFFFFF">
>     This is <b>VERY IMPORTANT</b> text.<br>
>     <br>
>     I attached the pic of birthday cake, so you tell me if you like it.<br>
>     <br>
>     Also, the pic of my town is there also.<br>
>     <br>
>     Bye,<br>
>     Steve<br>
>     <br>
>   </body>
> </html>
>
> --------------040402060408060209060509--
>
> --------------040507030005030300070403
> Content-Type: image/jpeg;
>  name="1240378_759900384071242_8612750244479085543_n.jpg"
> Content-Transfer-Encoding: base64
> Content-Disposition: attachment;
>  filename="1240378_759900384071242_8612750244479085543_n.jpg"
>
> /9j/4AAQSkZJRgABAgAAAQABAAD/7QA2UGhvdG9zaG9wIDMuMAA4QklNBAQAAAAAABkcAmcA
> FHlxX1NUTHFZdkZfSEl5X2JQSDlGAP/iAhxJQ0NfUFJPRklMRQABAQAAAgxsY21zAhAAAG1u
> dHJSR0IgWFlaIAfcAAEAGQADACkAOWFjc3BBUFBMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>
>

Reply via email to