i've installed tika-server-standard v2.4.0

launched with

        /etc/systemd/system/tika.service
                
                [Unit]
                Description=Apache Tika server
                After=network-online.target
                Requires=network-online.target

                [Service]
                SyslogIdentifier=tika
                User=tika
                Group=tika
                ExecStart=/usr/bin/java \
                 -jar /srv/tika/tika-server.jar \
                 --host 127.0.0.1 \
                 --port 9998
                SuccessExitStatus=143

                [Install]
                WantedBy=multi-user.target

it's up

        systemctl status tika
                ● tika.service - Apache Tika server
                     Loaded: loaded (/etc/systemd/system/tika.service; enabled; 
vendor preset: disabled)
                     Active: active (running) since Tue 2022-05-24 11:52:49 
EDT; 8min ago
                   Main PID: 24034 (java)
                      Tasks: 54 (limit: 8949)
                     Memory: 140.3M
                        CPU: 27.996s
                     CGroup: /system.slice/tika.service
                             ├─24034 /usr/bin/java -Dpdfbox.fontcache=/var/tika 
-jar /srv/tika/tika-server.jar --host 127.0.0.1 --port 9998
                             └─24069 java -Djava.awt.headless=true -cp 
/srv/tika/tika-server.jar -Dtika.server.id=ac3ee71e-988b-45a0-ad09-13425d8eeb01 
org.apache.tika.server.core.TikaServerProcess -h 127.0.0.1 -p 9998 -i 
ac3ee71e-988b-45a0-ad09-13425d8eeb01 -forkedStatusFile 
/tmp/apache-tika-server-forked-tmp-16850490462322854594 -numRestarts 0

, responding,

        curl \
        -T /tmp/test.pdf \
        http://127.0.0.1:9998/meta

          pdf:unmappedUnicodeCharsPerPage,0,0,0,0,0,0,0,0,0,0,0,0,0,0
          pdf:PDFVersion,1.4
          xmp:CreatorTool,Adobe InDesign 15.1 (Macintosh)
          pdf:hasXFA,false
          access_permission:modify_annotations,true
          access_permission:can_print_degraded,true
          
X-TIKA:Parsed-By-Full-Set,org.apache.tika.parser.DefaultParser,org.apache.tika.parser.pdf.PDFParser
          dcterms:created,2020-08-13T14:55:46Z
          language,en
          dcterms:modified,2020-09-24T23:38:28Z
          dc:format,application/pdf; version=1.4
          xmpMM:DocumentID,xmp.id:8a612346-9d03-4caf-8ebf-da6f3716ed0a
          pdf:docinfo:creator_tool,Adobe InDesign 15.1 (Macintosh)
          access_permission:fill_in_form,true
          pdf:docinfo:modified,2020-09-24T23:38:28Z
          pdf:hasCollection,false
          pdf:encrypted,false
          pdf:hasMarkedContent,true
          Content-Type,application/pdf
          dc:language,en-US
          pdf:producer,Adobe PDF Library 15.0
          access_permission:extract_for_accessibility,true
          access_permission:assemble_document,true
          xmpTPg:NPages,14
          pdf:hasXMP,true
          
pdf:charsPerPage,84,676,1653,1914,814,1022,645,1221,1087,732,887,1295,1263,149
          access_permission:extract_content,true
          
xmpMM:DerivedFrom:DocumentID,xmp.did:b98726d4-04c4-48f5-88be-0a48a0074356
          access_permission:can_print,true
          pdf:docinfo:trapped,false
          
X-TIKA:Parsed-By,org.apache.tika.parser.DefaultParser,org.apache.tika.parser.pdf.PDFParser
          
xmpMM:DerivedFrom:InstanceID,xmp.iid:3dd6a91f-a114-4d63-804e-e2b749c15075
          pdf:annotationTypes,null
          access_permission:can_modify,true
          pdf:docinfo:producer,Adobe PDF Library 15.0
          pdf:docinfo:created,2020-08-13T14:55:46Z
          pdf:annotationSubtypes,Link

and integrates well into my dovecot fts instance

next, preparing for some customization, with the to first simply replicate the 
default, prior to any mods,
I create an override config, cloning the jar's default.

        jar xf tika-server.jar
        cp ./tika-server-config-default.xml ./tika-server-config-custom.xml

and edit

        /etc/systemd/system/tika.service
                ...
                ExecStart=/usr/bin/java \
                 -Dpdfbox.fontcache=/var/tika \
                 -jar /srv/tika/tika-server.jar \
+                -c /srv/tika/tika-server-config-custom.xml \
                 --host 127.0.0.1 \
                 --port 9998

on relaunch, and attempted use in, e.g., dovecot indexing, the connection's 
refused

        2022-05-24 11:51:26 
indexer-worker([email protected])<k49UAP3+jAQKXQCI+IOfAw:rXI+Ev7+jGKKXQAA+IOfAw>:
 Error: fts_tika: PUT http://127.0.0.1:9998/tika/ failed: connect(127.0.0.1:9998) 
failed: Connection refused

no other changes have been made, other than the spec'n of the config.

removal of the config file spec gets it working again.

i assume that the /srv/tika/tika-server-config-custom.xml is not sufficient for 
tika-server launch?

what do i need to change to get to a default-launch state, WITH a config -- 
that I can eventually modify -- specified?

Reply via email to