Hi

Not sure if it can be of interest, CXF can manage subparts, see the last code fragment at:
http://cxf.apache.org/docs/jax-rs-multiparts.html#JAX-RSMultiparts-Formsandmultiparts

So, in Tika Server we can have a dedicated method on a unique path such as:


@POST
    @Consumes("multipart/form-data")
    @Produces("text/plain")
    @Path("form/files")
public StreamingOutput getTextFromMultipartFiles(@Multipart("files") List<Attachment> attachments, @Context final UriInfo info) {
        ...
    }

where the 'attachments' list represents subparts of a part named 'files' which appears to be standard name for a composite part:

http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2

I'm not sure how getTextFromMultipartFiles needs to be implemented - but I'm saying we can those subparts made available to Tika parsers at a Tila server level.

Cheers, Sergey




On 13/10/15 13:27, Luís Filipe Nassif wrote:
Tika uses mime4j to parse rfc822 mails, and I think mime4j currently
  has no support for that. See MIME4J-109 and MIME4J-206.

Luis

2015-10-13 1:32 GMT-03:00 Sergey Tsalkov <[email protected]
<mailto:[email protected]>>:

    rfc822 email files can contain attachments as subparts, and they'll
    generally specify the filename of the attachment in a manner like
    this:

    Content-Disposition: attachment;
             filename*=utf-8''image001.jpg

    Tika doesn't seem to be grabbing that information at all! Do you guys
    think this is a bug, or am I doing something wrong?

    Thanks!
    Sergey




--
Sergey Beryozkin

Talend Community Coders
http://coders.talend.com/

Reply via email to