Check TIKA-879 where a general solution was discussed. These problems with rfc822 detection are very recurrent.
Luis 2016-07-28 4:36 GMT-03:00 Vjeran Marcinko <[email protected]>: > Hello again, > > Just as I resolved the rpoblem with MBOX parser, I noticed that it > doesn't correctly detect contained RFC822 messages as message/rfc822, > but usually text/html or some variation of it. > > And question as before, is there some workaround for 1.13 to place in > custom-mimetypes.xml that would fix this? > > Here is a start of one such message from my mbox. file (I ommitted > MBOX message start line "From " which just marks start of each > contained message), because this is sent to embeedded parser which > doesn't recognize this as RFC822 type. I Even extracted this portion > of content to separate file and convinced myself that Tika truly don't > detect this as RFC822 > > X-GM-THRID: 1512463556322914280 > X-Gmail-Labels: Inbox,clojure > Delivered-To: [email protected] > Received: by 10.31.204.67 with SMTP id c64csp1943840vkg; > Wed, 16 Sep 2015 03:00:48 -0700 (PDT) > X-Received: by 10.140.238.214 with SMTP id > j205mr1658705qhc.21.1442397647994; > Wed, 16 Sep 2015 03:00:47 -0700 (PDT) > Return-Path: < > m-86i29s6rppu2flx1nqebu0g0hk5wgxj5s0vlvfx11p94yc32jypnkf41...@bounce.linkedin.com > > > Received: from mailb-af.linkedin.com (mailb-af.linkedin.com. > [108.174.3.150]) > by mx.google.com with ESMTPS id > q7si21212015qki.84.2015.09.16.03.00.47 > for <[email protected]> > (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); > Wed, 16 Sep 2015 03:00:47 -0700 (PDT) > Received-SPF: pass (google.com: domain of > > m-86i29s6rppu2flx1nqebu0g0hk5wgxj5s0vlvfx11p94yc32jypnkf41...@bounce.linkedin.com > designates 108.174.3.150 as permitted sender) client-ip=108.174.3.150; > Authentication-Results: mx.google.com; > spf=pass (google.com: domain of > > m-86i29s6rppu2flx1nqebu0g0hk5wgxj5s0vlvfx11p94yc32jypnkf41...@bounce.linkedin.com > designates 108.174.3.150 as permitted sender) > smtp.mailfrom= > m-86i29s6rppu2flx1nqebu0g0hk5wgxj5s0vlvfx11p94yc32jypnkf41...@bounce.linkedin.com > ; > dkim=pass [email protected]; > dmarc=pass (p=REJECT dis=NONE) header.from=linkedin.com > DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linkedin.com; > s=proddkim1024; t=1442397647; > bh=ZsM2cpYAX84d5ECwhjitGaKaCqYUJu7THSfox9AGoGs=; > h=From:Subject:MIME-Version:Content-Type:To:Date:X-LinkedIn-Class: > X-LinkedIn-Template:X-LinkedIn-fbl; > b=1rRg1j7tjk4zOq0f/yFbL4EbM2JuVP9c5yKr7FdpYYdoTRytYoLbdXjLrawfgvgh+ > dJ7L20UCIOrIyft1tez88CK/NkJ9g0fuor4klj+lpQ57NN/XURbXukRwJBwWpCGJ+g > pYc3hZgxJ/DrKILG1xTfoUO9qW3AziA6CGCNprr4= > From: Paulina Peczkowska <[email protected]> > Message-ID: < > 620370407.80198.1442397647784.javamail....@lva1-app2979.prod.linkedin.com> > Subject: =?UTF-8?Q?BIG_DATA_Developer/_Engineer_Wante?= > =?UTF-8?Q?d!_=E2=80=93_Job_offer_in_WrocLove,_Poland?= > MIME-Version: 1.0 > Content-Type: multipart/mixed; > boundary="----=_Part_80197_1293222758.1442397647784" > To: Vjeran Marcinko <[email protected]> > Date: Wed, 16 Sep 2015 10:00:47 +0000 (UTC) > X-LinkedIn-Class: INMAIL > X-LinkedIn-Template: inmail_sent > X-LinkedIn-fbl: > > m2-aszuze4gtmmy1h9ue2u7ub4ja7lqa8rsm59a91z648i3mnj3ljeeftgfblvpeyfttcogcdluvmwr6zwye8x4iqxk9nfut2ks3v79en > X-LinkedIn-Id: 5t9t4k-iemmbb9o-1s > List-Unsubscribe: > <mailto:[email protected] > ?subject=unsubscribe/AQFzeuoICPGM0QAAAU_VmXOnwp9hn8S4D89ESIFfgOZDuW-H1luaGdeqgtrsCStdPHfZwYCxr1-9TPs/5t9t4k-iemmbb9o-1s/m2-aszuze4gtmmy1h9ue2u7ub4ja7lqa8rsm59a91z648i3mnj3ljeeftgfblvpeyfttcogcdluvmwr6zwye8x4iqxk9nfut2ks3v79en> > Reply-To: Paulina Peczkowska > <[email protected]> > Feedback-ID: inmail_sent:linkedin > > ------=_Part_80197_1293222758.1442397647784 > Content-Type: multipart/alternative; > boundary="----=_Part_80194_920030466.1442397647781" > > ------=_Part_80194_920030466.1442397647781 > Content-Type: text/plain;charset=UTF-8 > Content-Transfer-Encoding: quoted-printable > Content-ID: text-body > > BIG DATA Developer/ Engineer Wanted! =E2=80=93 Job offer in WrocLove, > Polan= > d > > Dear Vjeran, > <br> > > <br> > My name is Paulina Pęczkowska and I’m a recruitment > specialist= > at IT Kontrakt GmbH=20 > <br> > dedicated to international projects.=20 > <br> >
