The checkstyle violation is about the coding style. You can delete that
part in the tika-parent/pom.xml if you want, or add <skip>true</skip>
below "<configuration>" in that plugin. Same for the
ossindex-maven-plugin and the forbiddenapis plugin.
If the debugger didn't stop, then the breakpoint was at the wrong place.
Or it's not possible to debug.
Re "is there anything informative in that now-more-verbose DEBUG output?
" well yes, the MD5 output. This proves that the file is different. (ok,
the different length showed that too)
Tilman
Am 19.07.2022 um 11:37 schrieb PGNet Dev:
On 7/18/22 11:05 PM, Tilman Hausherr wrote:
Yes the file is deleted...
Alternatively, grab the source code from the trunk, and add this line
in the file
tika-main\tika-parsers\tika-parsers-standard\tika-parsers-standard-modules\tika-parser-pdf-module\src\main\java\org\apache\tika\parser\pdf\PDFParser.java
Files.write(Paths.get("/tmp/yourfile.pdf"),
Files.readAllBytes(tstream.getPath()));
after the line that has ", md5: ".
Then build the parser module, and then the standard server subproject
with "mvn -DskipTests install".
1st, attempting the build, FAILs
cd src/tika
EDIT
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
...
168 if (LOG.isDebugEnabled() && tstream != null) {
LOG.debug("File: " + tstream.getPath() + ", length: "
+ tstream.getLength() +
", md5: " + calcMD5(tstream.getPath()));
+ Files.write(Paths.get("/tmp/yourfile.pdf"),
Files.readAllBytes(tstream.getPath()));
}
...
mvn install -pl tika-parsers -am
mvn -DskipTests install
...
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 31.493 s
[INFO] Finished at: 2022-07-19T04:48:43-04:00
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-checkstyle-plugin:3.1.2:check
(validate) on project tika-parser-pdf-module: You have 1 Checkstyle
violation. -> [Help 1]
try setting a breakpoint in org.apache.tika.parser.pdf.PDFParser so
that you get that file.
next, run in debugger instead,
sudo -u tika /usr/bin/jdb \
-classpath /srv/tika/tika-server.jar \
org.apache.tika.server.core.TikaServerCli \
-c /etc/tika/tika-server-config-custom.xml
Initializing jdb ...
set breakpoint
> stop in org.apache.tika.parser.pdf.PDFParser
Deferring breakpoint org.apache.tika.parser.pdf.PDFParser.
It will be set after the class is loaded.
run it
> run
run org.apache.tika.server.core.TikaServerCli -c
/etc/tika/tika-server-config-custom.xml
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
>
VM Started: DEBUG [pool-2-thread-1] 05:21:37,469
org.apache.tika.server.core.TikaServerWatchDog forked process
commandline: [/usr/bin/java, -Xms1g, -Xmx1g,
-Dpdfbox.fontcache=/var/tika, -Dlog4j2.debug,
-Djava.awt.headless=true, -cp, /srv/tika/tika-server.jar,
-Dtika.server.id=, org.apache.tika.server.core.TikaServerProcess, -h,
127.0.0.1, -p, 9998, -i, , -c,
/etc/tika/tika-server-config-custom.xml, -forkedStatusFile,
/tmp/apache-tika-server-forked-tmp-11335114907490900739, -numRestarts, 0]
...
DEBUG [main] 05:21:50,871 org.apache.cxf.endpoint.ServerImpl
register the server to serverRegistry
TRACE StatusLogger Log4jLoggerFactory.getContext() found anchor
class org.apache.tika.server.core.ServerStatusWatcher
INFO [main] 05:21:50,906
org.apache.tika.server.core.TikaServerProcess Started Apache Tika
server at http://127.0.0.1:9998/
receive email+attachment
*lots* of debug logs @ jdb console,
-> https://pastebin.com/HDtR9RKP
NOTE, there,
...
DEBUG [qtp485047320-31] 05:22:58,423
org.apache.tika.parser.pdf.PDFParser File:
/tmp/apache-tika-11251774738482156793.tmp, length: 104932, md5:
092bf24b2cac33fac27965549c99613a
...
but, no file captured
ls -al /tmp/apache-tika*tmp
ls: cannot access '/tmp/apache-tika*tmp': No such file or
directory
is there anything informative in that now-more-verbose DEBUG output?