Hello,

I fist noticed that my .mbox file doesn't get parsed by MBoxParser,
and later, after debugging Tika source code, I found what the problem
is - default detector doesn't even recognize it as "applciation/mbox"
MIME type, and although file extension is .mbox, it ignores this hint
because its "magic" way of detecting file type based on some amount of
initial bytes detects it is "text/html" so it ignores the hint, and
returns "text/html"...And by consequence, the parsing never goes to
the correct parser.

Is there some way I could override this magic detection and enforce
that detection in this case is based solely on file extension for
these .mbox files?

-Vjeran

#################################################################################
Anyway, here is the beginning of my MBOX file which I got from Google
exporting my GMAil emails:


>From 1540828415824941917@xxx Mon Jul 25 12:08:06 +0000 2016
X-GM-THRID: 1540828415824941917
X-Gmail-Labels: Inbox,Important,clojure
Delivered-To: [email protected]
Received: by 10.31.56.17 with SMTP id f17csp1614203vka;
        Mon, 25 Jul 2016 05:08:06 -0700 (PDT)
X-Received: by 10.202.95.133 with SMTP id t127mr8226795oib.80.1469448485990;
        Mon, 25 Jul 2016 05:08:05 -0700 (PDT)
Return-Path: <[email protected]>
Received: from o1678940x148.outbound-mail.sendgrid.net
(o1678940x148.outbound-mail.sendgrid.net. [167.89.40.148])
        by mx.google.com with ESMTPS id k58si11358370otb.279.2016.07.25.05.08.05
        for <[email protected]>
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Mon, 25 Jul 2016 05:08:05 -0700 (PDT)
Received-SPF: pass (google.com: domain of
[email protected] designates
167.89.40.148 as permitted sender) client-ip=167.89.40.148;
Authentication-Results: mx.google.com;
       dkim=pass [email protected];
       dkim=pass [email protected];
       spf=pass (google.com: domain of
[email protected] designates
167.89.40.148 as permitted sender)
smtp.mailfrom=bounces+2693180-18a0-vmarcinko=gmail....@m.dripemail2.com
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=dripemail2.com;
h=content-type:from:mime-version:subject:to; s=s1;
bh=wbY8sP/TelOpmU6q09dgY8v3muI=; b=Vo/m0Lx7f8jNAHU2m0vLO6StuGms/
XeJeiLBV4CHyhwMNr4UuuBIJmDVGIuv6YGSJPN9REUYVuCqFyaPOAZiBtlie8Awq
7uB7KxZKnFPDh/7XQRz1Z1kKx0dGiENBOoymZFglCebm9my2i+trZ6EzN4YFOB/+
ZNpksoRirEVhws=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sendgrid.info;
h=content-type:from:mime-version:subject:to:x-feedback-id;
s=smtpapi; bh=wbY8sP/TelOpmU6q09dgY8v3muI=; b=vnSfe24bbcPSeungct
GphBd1h4S4i96PxeapkjmxCLyzeItTItNETiCtkLFbGnzFTVYVvzDOmcI47BYFHu
yOM0kILRdMzFt1d7HNVE1EJCB0DHVS83Yk7vaH/jc+IU34jJgZBlG0yR292QYtYk
7WA4ETOIQnQ+3K3pJ+wUYNGKs=
Received: by filter0448p1mdw1.sendgrid.net with SMTP id
filter0448p1mdw1.23984.5796012246
        2016-07-25 12:08:02.669274519 +0000 UTC
Received: from MjY5MzE4MA (ec2-54-210-139-199.compute-1.amazonaws.com
[54.210.139.199])
by ismtpd0002p1iad1.sendgrid.net (SG) with HTTP id zyxIxF_lRFKgFZxIoq9BKA
for <[email protected]>; Mon, 25 Jul 2016 12:08:02.739 +0000 (UTC)
Content-Type: multipart/alternative;
boundary=0082ce9e57fb837e9dfa9ca77bc69f450567ae3138b24a5db1e7237fc121
Date: Mon, 25 Jul 2016 12:08:02 +0000
From: "Eric at PurelyFunctional.tv" <[email protected]>
Mime-Version: 1.0
Subject: Twitter Bot, Atom Editor, and Scraping HTML
To: [email protected]
Message-ID: <[email protected]>
X-SG-EID: 
pywWA7gL46oOK7j8609IHsuM8bBS72IBx+uWB+d8D/N9t0rE4+TMmdgXQpvC7JIN3ekubbU2qCgHqS
 7W8GJ+aKX8qAKYokC5jzRvyv4CX3KHlasoMaqSUGqYEuHYx1e9vMNhqBIB4+nZN4uZmnKvRrvnYMZy
 NtpRNDKB0S28xjv5CxGmqbRggtf8RLQ7d2s5RIuQwIMIZQ3nLl3OrnmbjtZAP91VtQFkbhRATrKx7i
 o=
X-SG-ID: 
6l1ICXxVk1U2NQBE+KPgx+uy7/oBj9jrT6lO2L7BaL4cap+kBh3uUy+RmDmEF7s+mSBwxVfvlgfHyu
 osKIvS9Q==
X-Feedback-ID: 
2693180:l1fkQA9YLlZ4PTqywTL3Zu+zLq2XYmkeuiZ1WV+xvFE=:l1fkQA9YLlZ4PTqywTL3Zu+zLq2XYmkeuiZ1WV+xvFE=:SG

--0082ce9e57fb837e9dfa9ca77bc69f450567ae3138b24a5db1e7237fc121
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=UTF-8
Mime-Version: 1.0

Dear Clojurist,

Thanks again for being there. I am so lucky to have you here on
my PurelyFunctional.tv email list.

A lot of people ask me what it takes to be hirable in Clojure. Of
course, the answer is complicated, but the short version is "not
very much". I wrote about it.

Read What do I have to learn to be hirable in Clojure? ( http://t.dripemail=
2.com/c/eyJhY2NvdW50X2lkIjoiMzY1MTcxNyIsImRlbGl2ZXJ5X2lkIjoiMjE3NTQ4MzEyIiw=
idXJsIjoiaHR0cDovL3d3dy5saXNwY2FzdC5jb20vaGlyYWJsZS1pbi1jbG9qdXJlP19fcz15bj=
R6dm8xcnY5cGhkazR4cG11diJ9 )

Reply via email to