[
https://issues.apache.org/jira/browse/JAMES-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676106#comment-16676106
]
Tellier Benoit commented on JAMES-2581:
---------------------------------------
I agree that we may choose a more modular design allowing fubnctionality re-use.
But currently we have only a signle viable text extractor. Introducing new
configuration files seems to be really overkill to me, at the very least for
now.
> Configurable ContentType blacklist for Tika
> -------------------------------------------
>
> Key: JAMES-2581
> URL: https://issues.apache.org/jira/browse/JAMES-2581
> Project: James Server
> Issue Type: Improvement
> Reporter: Dat Pham
> Priority: Major
>
> Enhanced production logging upon Tika failing call highlight the fact that
> our installation of Tika **can not** handle some kinds of attachments.
> Here is a log example:
> ```
> org.apache.http.client.HttpResponseException: Unsupported Media Type
> at
> org.apache.http.impl.client.AbstractResponseHandler.handleResponse(AbstractResponseHandler.java:70)
> at org.apache.http.client.fluent.Response.handleResponse(Response.java:90)
> at org.apache.http.client.fluent.Response.returnContent(Response.java:97)
> at
> org.apache.james.mailbox.tika.TikaHttpClientImpl.recursiveMetaDataAsJson(TikaHttpClientImpl.java:62)
> at
> org.apache.james.mailbox.tika.TikaTextExtractor.performContentExtraction(TikaTextExtractor.java:86)
> at
> org.apache.james.mailbox.tika.TikaTextExtractor.lambda$extractContent$0(TikaTextExtractor.java:81)
> ```
> (131 matches in the last 2 days)
> Here is a list if Content types we recurringly fail on:
> - application/ics
> - application/zip
> - application/pgp-signature
> - image/jpg
> - image/jpeg
> - image/png
> - message/delivery-status
> As an admin, I should be able to specify in `tika.properties` file a coma
> separated list of Content type to blacklist.
> Benefits:
> - Avoid known-to-be-failing Tika calls - reduce log output
> - Avoid transmitting potentially big payload over the network for nothing -
> performance
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]