Thanx a bunch for a suggested workaround.

Also, I have checked and bug exists in latest 1.4 nightly build

-Vjeran

On Tue, Jul 26, 2016 at 2:22 AM, Luís Filipe Nassif <[email protected]> wrote:
> Hi,
>
> Based on https://en.wikipedia.org/wiki/Mbox, you can add the following entry
> in org/apache/tika/mime/custom-mimetypes.xml:
>
> <mime-type type="application/mbox">
>         <magic priority="70">
>             <match value="From " type="string" offset="0"/>
>         </magic>
>         <glob pattern="*.mbox"/>
>     </mime-type>
>
> The priority must be greater than message/rfc822. It sometimes returns false
> positives, but detects mbox files without extension, which are very very
> commom.
>
> Luis
>
> 2016-07-25 16:36 GMT-03:00 Allison, Timothy B. <[email protected]>:
>>
>>     <repositories>
>>         <repository>
>>             <id>apache.snapshots</id>
>>             <name>Apache Development Snapshot Repository</name>
>>
>> <url>https://repository.apache.org/content/repositories/snapshots/</url>
>>             <releases>
>>                 <enabled>false</enabled>
>>             </releases>
>>             <snapshots>
>>                 <enabled>true</enabled>
>>             </snapshots>
>>         </repository>
>>     </repositories>
>>
>> -----Original Message-----
>> From: Vjeran Marcinko [mailto:[email protected]]
>> Sent: Monday, July 25, 2016 3:25 PM
>> To: [email protected]
>> Subject: Re: Problem with detection of .mbox file
>>
>> Thanx guys, I can do it in some clumsy way, but before I try it, is there
>> some maven repo for such nightly builds that I can include and specify these
>> 1.4-SNAPSHOT deps ?
>>
>> On Mon, Jul 25, 2016 at 9:14 PM, Allison, Timothy B. <[email protected]>
>> wrote:
>> >> Can you try with a recent Tika nightly build?
>> > e.g.
>> > https://builds.apache.org/job/Tika-trunk/lastBuild/org.apache.tika$tik
>> > a-app/
>> >
>> > -----Original Message-----
>> > From: Nick Burch [mailto:[email protected]]
>> > Sent: Monday, July 25, 2016 3:03 PM
>> > To: [email protected]
>> > Subject: Re: Problem with detection of .mbox file
>> >
>> > On Mon, 25 Jul 2016, Vjeran Marcinko wrote:
>> >> I fist noticed that my .mbox file doesn't get parsed by MBoxParser,
>> >> and later, after debugging Tika source code, I found what the problem
>> >> is - default detector doesn't even recognize it as "applciation/mbox"
>> >> MIME type, and although file extension is .mbox, it ignores this hint
>> >> because its "magic" way of detecting file type based on some amount
>> >> of initial bytes detects it is "text/html"
>> >
>> > Can you try with a recent Tika nightly build? Only there have been
>> > some tweaks done around that sort of thing recently
>> >
>> > If a nightly build / build from Git still shows the issue, please open a
>> > bug in Jira and attach a problematic file, then we can take a look!
>> >
>> > Nick
>
>

Reply via email to