On 7/20/22 2:13 PM, Tilman Hausherr wrote:
I noticed you have "Accept: text/plain"

When I try this:

curl -T Get_Started_With_Smallpdf.pdf http://localhost:9998/tika --header "Accept: 
text/plain"

I get

Caused by: java.util.NoSuchElementException: No value present
         at java.util.OptionalInt.getAsInt(OptionalInt.java:130) ~[?:?]
         at 
org.apache.tika.server.core.ProduceTypeResourceComparator.compareProduceTypes(ProduceTypeResourceComparator.java:136)
 ~[tika-server-standard-2.4.2-SNAPSHOT.jar:2.4.2-SNAPSHOT]
         at 
org.apache.tika.server.core.ProduceTypeResourceComparator.compare(ProduceTypeResourceComparator.java:97)
 ~[tika-server-standard-2.4.2-SNAPSHOT.jar:2.4.2-SNAPSHOT]
         at 
org.apache.cxf.jaxrs.model.OperationResourceInfoComparator.compare(OperationResourceInfoComparator.java:69)
 ~[tika-server-standard-2.4.2-SNAPSHOT.jar:2.4.2-SNAPSHOT]
         at 
org.apache.cxf.jaxrs.model.OperationResourceInfoComparator.compare(OperationResourceInfoComparator.java:31)
 ~[tika-server-standard-2.4.2-SNAPSHOT.jar:2.4.2-SNAPSHOT]
         at java.util.TreeMap.put(TreeMap.java:795) ~[?:?]
         at java.util.TreeMap.put(TreeMap.java:534) ~[?:?]
         at 
org.apache.cxf.jaxrs.utils.JAXRSUtils.findTargetMethod(JAXRSUtils.java:551) 
~[tika-server-standard-2.4.2-SNAPSHOT.jar:2.4.2-SNAPSHOT]

without the header, I get the html output.

interesting catch. what _should_ it be for programmatic submission (e.g. via 
dovecot fts-tika) to tika?  text or html?

it's reported here in the tika logs I posted, earliest at

        ...
        DEBUG [qtp485047320-28] 11:01:15,794 
org.eclipse.jetty.server.HttpChannel REQUEST for //127.0.0.1:9998/tika/ on 
HttpChannelOverHttp@2ab20b5f{s=HttpChannelState@1dd88b59{s=IDLE rs=BLOCKING 
os=OPEN is=IDLE awp=false se=false i=true 
al=0},r=1,c=false/false,a=IDLE,uri=//127.0.0.1:9998/tika/,age=1}
        PUT //127.0.0.1:9998/tika/ HTTP/1.1
        Host: 127.0.0.1:9998
        Date: Wed, 20 Jul 2022 15:01:15 GMT
        Transfer-Encoding: chunked
        Connection: keep-alive
        Content-Type: application/pdf
        Content-Disposition: attachment; 
filename="Get_Started_With_Smallpdf.pdf"
!!      Accept: text/plain
        ...


which appears to be the PUT, I assume, pushed by the dovecot-end of the 
handshake.

checking dovecot source, it hails from here,

        
https://github.com/dovecot/core/blob/main/src/plugins/fts/fts-parser-tika.c#L170

                if (parser_context->content_disposition != NULL)
                                http_client_request_add_header(http_req, 
"Content-Disposition",
                                                               
parser_context->content_disposition);
!!      170             http_client_request_add_header(http_req, "Accept", 
"text/plain");

                        parser->http_req = http_req;
                        return &parser->parser;
                }

The '"Accept", "text/plain"' has been there awhile; e.g., quick-checking old 
release source for v2.3.8, from Oct 8, 2019,

        
https://github.com/dovecot/core/blob/release-2.3.8/src/plugins/fts/fts-parser-tika.c#L163


Reply via email to