[ 
https://issues.apache.org/jira/browse/TIKA-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764475#action_12764475
 ] 

Ken Krugler commented on TIKA-295:
----------------------------------

Hi Jukka,

Is there an Eclipse formatter file that defines the Tika project's target 
format?

Thanks,

-- Ken

> Rough cut of mbox parser
> ------------------------
>
>                 Key: TIKA-295
>                 URL: https://issues.apache.org/jira/browse/TIKA-295
>             Project: Tika
>          Issue Type: New Feature
>    Affects Versions: 0.4
>            Reporter: Ken Krugler
>            Assignee: Jukka Zitting
>             Fix For: 0.5
>
>         Attachments: tika-295.patch
>
>
> Attached is a patch for a first-cut at a parser that handles mailbox (.mbox, 
> application/mbox) files.
> * The first email headers are used to fill in metadata. Subsequent email 
> headers are tossed.
> * Charset handling needs to be fixed up. It's unclear (not spec'd) whether 
> emails individually use the charset as specified in their individual header, 
> or the entire file should be re-encoded (and the encoding is sent in the 
> response header, or auto-detected).
> * Multi-part emails won't be handled properly, though it's unclear what 
> should be done in that case (if anything).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to