Rough cut of mbox parser ------------------------ Key: TIKA-295 URL: https://issues.apache.org/jira/browse/TIKA-295 Project: Tika Issue Type: New Feature Affects Versions: 0.4 Reporter: Ken Krugler
Attached is a patch for a first-cut at a parser that handles mailbox (.mbox, application/mbox) files. * The first email headers are used to fill in metadata. Subsequent email headers are tossed. * Charset handling needs to be fixed up. It's unclear (not spec'd) whether emails individually use the charset as specified in their individual header, or the entire file should be re-encoded (and the encoding is sent in the response header, or auto-detected). * Multi-part emails won't be handled properly, though it's unclear what should be done in that case (if anything). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.