On 7/16/22 10:51 PM, Tilman Hausherr wrote:
You didn't get the exception I mentioned; then set the breakpoint at parse() to
get the fileLen. The current error messages suggests that bytes have been
changed or have been lost.
IIRC tika saves the PDF in a file in the temp directory before parsing, maybe
look there at that time and compare the length and content with your own.
i haven't managed to stop at any *.parse bkpt i set after `jdb -attach`
wondering if req'd debug info is included/complete in the runnable jar, i
decided to try a clean mvn build
git checkout 2.4.1
mvn clean
mvn -X compile -am -pl :tika-server-standard
which fails
...
[DEBUG] 82 component-reports; 16.90 ms
[WARNING] Excluding coordinates: com.google.guava:guava:31.1-jre
[INFO]
------------------------------------------------------------------------
[INFO] Reactor Summary for Apache Tika parent 2.4.1:
[INFO]
[INFO] Apache Tika parent ................................. SUCCESS [
0.790 s]
[INFO] Apache Tika core ................................... SUCCESS [
4.806 s]
[INFO] Apache Tika serialization .......................... SUCCESS [
0.698 s]
[INFO] Apache Tika parser modules ......................... SUCCESS [
0.045 s]
[INFO] Apache Tika standard parser modules and package .... SUCCESS [
0.033 s]
[INFO] Apache Tika standard parser modules ................ SUCCESS [
0.030 s]
[INFO] Apache Tika html commons ........................... SUCCESS [
0.114 s]
[INFO] Apache Tika digest commons ......................... SUCCESS [
0.154 s]
[INFO] Apache Tika mail commons ........................... SUCCESS [
0.078 s]
[INFO] Apache Tika XMP commons ............................ SUCCESS [
0.120 s]
[INFO] Apache Tika ZIP commons ............................ SUCCESS [
0.213 s]
[INFO] Apache Tika image parser module .................... SUCCESS [
0.355 s]
[INFO] Apache Tika OCR parser module ...................... SUCCESS [
0.302 s]
[INFO] Apache Tika audiovideo parser module ............... SUCCESS [
0.369 s]
[INFO] Apache Tika text parser module ..................... SUCCESS [
0.424 s]
[INFO] Apache Tika code parser module ..................... SUCCESS [
0.205 s]
[INFO] Apache Tika html parser module ..................... SUCCESS [
0.305 s]
[INFO] Apache Tika font parser module ..................... SUCCESS [
0.078 s]
[INFO] Apache Tika XML parser module ...................... SUCCESS [
0.132 s]
[INFO] Apache Tika Microsoft parser module ................ SUCCESS [
2.600 s]
[INFO] Apache Tika package parser module .................. SUCCESS [
0.145 s]
[INFO] Apache Tika PDF parser module ...................... SUCCESS [
0.667 s]
[INFO] Apache Tika Apple parser module .................... SUCCESS [
0.216 s]
[INFO] Apache Tika cad parser module ...................... SUCCESS [
0.203 s]
[INFO] Apache Tika mail parser module ..................... SUCCESS [
0.187 s]
[INFO] Apache Tika miscellaneous office format parser module SUCCESS [
0.421 s]
[INFO] Apache Tika news parser module ..................... SUCCESS [
0.163 s]
[INFO] Apache Tika crypto parser module ................... SUCCESS [
0.106 s]
[INFO] Apache Tika WARC parser module ..................... SUCCESS [
0.104 s]
[INFO] Apache Tika standard parser package ................ SUCCESS [
0.565 s]
[INFO] Apache Tika XMP .................................... SUCCESS [
0.286 s]
[INFO] Apache Tika language detection ..................... SUCCESS [
0.021 s]
[INFO] Apache Tika langdetect test commons ................ SUCCESS [
0.057 s]
[INFO] Apache Tika Optimaize langdetect ................... SUCCESS [
0.108 s]
[INFO] Apache Tika OpenNLP langdetect ..................... SUCCESS [
0.114 s]
[INFO] Apache Tika pipes .................................. SUCCESS [
0.018 s]
[INFO] Apache Tika emitters ............................... SUCCESS [
0.017 s]
[INFO] Apache Tika filesystem emitter ..................... SUCCESS [
0.065 s]
[INFO] Apache Tika translate .............................. SUCCESS [
0.446 s]
[INFO] Apache Tika server module .......................... SUCCESS [
0.019 s]
[INFO] Apache Tika server core ............................ FAILURE [
0.112 s]
[INFO] Apache Tika standard server ........................ SKIPPED
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 16.545 s
[INFO] Finished at: 2022-07-17T09:41:53-04:00
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal
org.sonatype.ossindex.maven:ossindex-maven-plugin:3.2.0:audit
(audit-dependencies) on project tika-server-core: Detected 2 vulnerable
components:
[ERROR] org.eclipse.jetty:jetty-server:jar:9.4.46.v20220331:compile;
https://ossindex.sonatype.org/component/pkg:maven/org.eclipse.jetty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR] * [CVE-2022-2047] CWE-20: Improper Input Validation (2.7);
https://ossindex.sonatype.org/vulnerability/CVE-2022-2047?component-type=maven&component-name=org.eclipse.jetty%2Fjetty-server&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR] org.eclipse.jetty:jetty-http:jar:9.4.46.v20220331:compile;
https://ossindex.sonatype.org/component/pkg:maven/org.eclipse.jetty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR] * [CVE-2022-2047] CWE-20: Improper Input Validation (2.7);
https://ossindex.sonatype.org/vulnerability/CVE-2022-2047?component-type=maven&component-name=org.eclipse.jetty%2Fjetty-http&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
[ERROR]
[ERROR] Excluded coordinates:
[ERROR] - com.google.guava:guava:31.1-jre
[ERROR]
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
execute goal org.sonatype.ossindex.maven:ossindex-maven-plugin:3.2.0:audit
(audit-dependencies) on project tika-server-core: Detected 2 vulnerable
components:
org.eclipse.jetty:jetty-server:jar:9.4.46.v20220331:compile;
https://ossindex.sonatype.org/component/pkg:maven/org.eclipse.jetty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
* [CVE-2022-2047] CWE-20: Improper Input Validation (2.7);
https://ossindex.sonatype.org/vulnerability/CVE-2022-2047?component-type=maven&component-name=org.eclipse.jetty%2Fjetty-server&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
org.eclipse.jetty:jetty-http:jar:9.4.46.v20220331:compile;
https://ossindex.sonatype.org/component/pkg:maven/org.eclipse.jetty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
* [CVE-2022-2047] CWE-20: Improper Input Validation (2.7);
https://ossindex.sonatype.org/vulnerability/CVE-2022-2047?component-type=maven&component-name=org.eclipse.jetty%2Fjetty-http&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
Excluded coordinates:
- com.google.guava:guava:31.1-jre
at org.apache.maven.lifecycle.internal.MojoExecutor.execute
(MojoExecutor.java:215)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute
(MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute
(MojoExecutor.java:148)
at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject
(LifecycleModuleBuilder.java:117)
at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject
(LifecycleModuleBuilder.java:81)
at
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
(SingleThreadedBuilder.java:56)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute
(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:972)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:196)
at jdk.internal.reflect.DirectMethodHandleAccessor.invoke
(DirectMethodHandleAccessor.java:104)
at java.lang.reflect.Method.invoke (Method.java:577)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced
(Launcher.java:282)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch
(Launcher.java:225)
at
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode
(Launcher.java:406)
at org.codehaus.plexus.classworlds.launcher.Launcher.main
(Launcher.java:347)
Caused by: org.apache.maven.plugin.MojoFailureException: Detected 2
vulnerable components:
org.eclipse.jetty:jetty-server:jar:9.4.46.v20220331:compile;
https://ossindex.sonatype.org/component/pkg:maven/org.eclipse.jetty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
* [CVE-2022-2047] CWE-20: Improper Input Validation (2.7);
https://ossindex.sonatype.org/vulnerability/CVE-2022-2047?component-type=maven&component-name=org.eclipse.jetty%2Fjetty-server&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
org.eclipse.jetty:jetty-http:jar:9.4.46.v20220331:compile;
https://ossindex.sonatype.org/component/pkg:maven/org.eclipse.jetty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
* [CVE-2022-2047] CWE-20: Improper Input Validation (2.7);
https://ossindex.sonatype.org/vulnerability/CVE-2022-2047?component-type=maven&component-name=org.eclipse.jetty%2Fjetty-http&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
Excluded coordinates:
- com.google.guava:guava:31.1-jre
at org.sonatype.ossindex.maven.plugin.AuditMojoSupport.execute
(AuditMojoSupport.java:257)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo
(DefaultBuildPluginManager.java:137)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute
(MojoExecutor.java:210)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute
(MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute
(MojoExecutor.java:148)
at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject
(LifecycleModuleBuilder.java:117)
at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject
(LifecycleModuleBuilder.java:81)
at
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
(SingleThreadedBuilder.java:56)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute
(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:972)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:196)
at jdk.internal.reflect.DirectMethodHandleAccessor.invoke
(DirectMethodHandleAccessor.java:104)
at java.lang.reflect.Method.invoke (Method.java:577)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced
(Launcher.java:282)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch
(Launcher.java:225)
at
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode
(Launcher.java:406)
at org.codehaus.plexus.classworlds.launcher.Launcher.main
(Launcher.java:347)
[ERROR]
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with
the command
[ERROR] mvn <args> -rf :tika-server-core
checking @
https://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
"Unlike many other errors, this exception is not generated by the
Maven core itself but by a plugin. As a rule of thumb, plugins use this error to signal a
failure of the build because there is something wrong with the dependencies or sources of
a project, e.g. a compilation or a test failure."
in /tmp
immediately after tika-server start
'/usr/bin/tree -Csup --timefmt "%F %R:%S %z"' /tmp | grep tika
├── [-rw------- tika 0 2022-07-17 09:54:08 -0400]
apache-tika-server-forked-tmp-16337036696243797817
├── [drwxr-xr-x tika 80 2022-07-17 09:54:08 -0400]
hsperfdata_tika
│ ├── [-rw------- tika 32768 2022-07-17 09:54:04
-0400] 15865
│ └── [-rw------- tika 32768 2022-07-17 09:54:08
-0400] 15902
, and, same -- i.e. nothing added -- after receipt of email with failed tika
scan/parse
anyone have some explicit instructions for setting a catchable breakpoint in a
jdb -attach to tika-server?
or, error-free build instructions?