[ https://issues.apache.org/jira/browse/TIKA-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765410#action_12765410 ]
Alex Baranov commented on TIKA-295: ----------------------------------- I guess since the Tika is subproject of Lucene you should use the same format as for other Lucene projects: http://wiki.apache.org/lucene-java/HowToContribute http://wiki.apache.org/solr/HowToContribute (in the end of the pages). One question about the parser - do you still work on it? Any progress from the first draft? > Rough cut of mbox parser > ------------------------ > > Key: TIKA-295 > URL: https://issues.apache.org/jira/browse/TIKA-295 > Project: Tika > Issue Type: New Feature > Affects Versions: 0.4 > Reporter: Ken Krugler > Assignee: Jukka Zitting > Fix For: 0.5 > > Attachments: tika-295.patch > > > Attached is a patch for a first-cut at a parser that handles mailbox (.mbox, > application/mbox) files. > * The first email headers are used to fill in metadata. Subsequent email > headers are tossed. > * Charset handling needs to be fixed up. It's unclear (not spec'd) whether > emails individually use the charset as specified in their individual header, > or the entire file should be re-encoded (and the encoding is sent in the > response header, or auto-detected). > * Multi-part emails won't be handled properly, though it's unclear what > should be done in that case (if anything). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.