Hi
Not sure if it can be of interest, CXF can manage subparts, see the last
code fragment at:
http://cxf.apache.org/docs/jax-rs-multiparts.html#JAX-RSMultiparts-Formsandmultiparts
So, in Tika Server we can have a dedicated method on a unique path such as:
@POST
@Consumes("multipart/form-data")
@Produces("text/plain")
@Path("form/files")
public StreamingOutput
getTextFromMultipartFiles(@Multipart("files") List<Attachment>
attachments, @Context final UriInfo info) {
...
}
where the 'attachments' list represents subparts of a part named 'files'
which appears to be standard name for a composite part:
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2
I'm not sure how getTextFromMultipartFiles needs to be implemented - but
I'm saying we can those subparts made available to Tika parsers at a
Tila server level.
Cheers, Sergey
On 13/10/15 13:27, Luís Filipe Nassif wrote:
Tika uses mime4j to parse rfc822 mails, and I think mime4j currently
has no support for that. See MIME4J-109 and MIME4J-206.
Luis
2015-10-13 1:32 GMT-03:00 Sergey Tsalkov <[email protected]
<mailto:[email protected]>>:
rfc822 email files can contain attachments as subparts, and they'll
generally specify the filename of the attachment in a manner like
this:
Content-Disposition: attachment;
filename*=utf-8''image001.jpg
Tika doesn't seem to be grabbing that information at all! Do you guys
think this is a bug, or am I doing something wrong?
Thanks!
Sergey
--
Sergey Beryozkin
Talend Community Coders
http://coders.talend.com/