On 7/20/22 2:13 PM, Tilman Hausherr wrote:
I noticed you have "Accept: text/plain"
When I try this:
curl -T Get_Started_With_Smallpdf.pdf http://localhost:9998/tika --header "Accept:
text/plain"
I get
Caused by: java.util.NoSuchElementException: No value present
at java.util.OptionalInt.getAsInt(OptionalInt.java:130) ~[?:?]
at
org.apache.tika.server.core.ProduceTypeResourceComparator.compareProduceTypes(ProduceTypeResourceComparator.java:136)
~[tika-server-standard-2.4.2-SNAPSHOT.jar:2.4.2-SNAPSHOT]
at
org.apache.tika.server.core.ProduceTypeResourceComparator.compare(ProduceTypeResourceComparator.java:97)
~[tika-server-standard-2.4.2-SNAPSHOT.jar:2.4.2-SNAPSHOT]
at
org.apache.cxf.jaxrs.model.OperationResourceInfoComparator.compare(OperationResourceInfoComparator.java:69)
~[tika-server-standard-2.4.2-SNAPSHOT.jar:2.4.2-SNAPSHOT]
at
org.apache.cxf.jaxrs.model.OperationResourceInfoComparator.compare(OperationResourceInfoComparator.java:31)
~[tika-server-standard-2.4.2-SNAPSHOT.jar:2.4.2-SNAPSHOT]
at java.util.TreeMap.put(TreeMap.java:795) ~[?:?]
at java.util.TreeMap.put(TreeMap.java:534) ~[?:?]
at
org.apache.cxf.jaxrs.utils.JAXRSUtils.findTargetMethod(JAXRSUtils.java:551)
~[tika-server-standard-2.4.2-SNAPSHOT.jar:2.4.2-SNAPSHOT]
without the header, I get the html output.
interesting catch. what _should_ it be for programmatic submission (e.g. via
dovecot fts-tika) to tika? text or html?
it's reported here in the tika logs I posted, earliest at
...
DEBUG [qtp485047320-28] 11:01:15,794
org.eclipse.jetty.server.HttpChannel REQUEST for //127.0.0.1:9998/tika/ on
HttpChannelOverHttp@2ab20b5f{s=HttpChannelState@1dd88b59{s=IDLE rs=BLOCKING
os=OPEN is=IDLE awp=false se=false i=true
al=0},r=1,c=false/false,a=IDLE,uri=//127.0.0.1:9998/tika/,age=1}
PUT //127.0.0.1:9998/tika/ HTTP/1.1
Host: 127.0.0.1:9998
Date: Wed, 20 Jul 2022 15:01:15 GMT
Transfer-Encoding: chunked
Connection: keep-alive
Content-Type: application/pdf
Content-Disposition: attachment;
filename="Get_Started_With_Smallpdf.pdf"
!! Accept: text/plain
...
which appears to be the PUT, I assume, pushed by the dovecot-end of the
handshake.
checking dovecot source, it hails from here,
https://github.com/dovecot/core/blob/main/src/plugins/fts/fts-parser-tika.c#L170
if (parser_context->content_disposition != NULL)
http_client_request_add_header(http_req,
"Content-Disposition",
parser_context->content_disposition);
!! 170 http_client_request_add_header(http_req, "Accept",
"text/plain");
parser->http_req = http_req;
return &parser->parser;
}
The '"Accept", "text/plain"' has been there awhile; e.g., quick-checking old
release source for v2.3.8, from Oct 8, 2019,
https://github.com/dovecot/core/blob/release-2.3.8/src/plugins/fts/fts-parser-tika.c#L163