On 7/16/22 10:51 PM, Tilman Hausherr wrote:
You didn't get the exception I mentioned; then set the breakpoint at parse() to 
get the fileLen. The current error messages suggests that bytes have been 
changed or have been lost.

IIRC tika saves the PDF in a file in the temp directory before parsing, maybe 
look there at that time and compare the length and content with your own.


i haven't managed to stop at any *.parse bkpt i set after `jdb -attach`

wondering if req'd debug info is included/complete in the runnable jar, i 
decided to try a clean mvn build

        git checkout 2.4.1
        mvn clean
        mvn -X compile -am -pl :tika-server-standard

which fails

        ...
        [DEBUG] 82 component-reports; 16.90 ms
        [WARNING] Excluding coordinates: com.google.guava:guava:31.1-jre
        [INFO] 
------------------------------------------------------------------------
        [INFO] Reactor Summary for Apache Tika parent 2.4.1:
        [INFO]
        [INFO] Apache Tika parent ................................. SUCCESS [  
0.790 s]
        [INFO] Apache Tika core ................................... SUCCESS [  
4.806 s]
        [INFO] Apache Tika serialization .......................... SUCCESS [  
0.698 s]
        [INFO] Apache Tika parser modules ......................... SUCCESS [  
0.045 s]
        [INFO] Apache Tika standard parser modules and package .... SUCCESS [  
0.033 s]
        [INFO] Apache Tika standard parser modules ................ SUCCESS [  
0.030 s]
        [INFO] Apache Tika html commons ........................... SUCCESS [  
0.114 s]
        [INFO] Apache Tika digest commons ......................... SUCCESS [  
0.154 s]
        [INFO] Apache Tika mail commons ........................... SUCCESS [  
0.078 s]
        [INFO] Apache Tika XMP commons ............................ SUCCESS [  
0.120 s]
        [INFO] Apache Tika ZIP commons ............................ SUCCESS [  
0.213 s]
        [INFO] Apache Tika image parser module .................... SUCCESS [  
0.355 s]
        [INFO] Apache Tika OCR parser module ...................... SUCCESS [  
0.302 s]
        [INFO] Apache Tika audiovideo parser module ............... SUCCESS [  
0.369 s]
        [INFO] Apache Tika text parser module ..................... SUCCESS [  
0.424 s]
        [INFO] Apache Tika code parser module ..................... SUCCESS [  
0.205 s]
        [INFO] Apache Tika html parser module ..................... SUCCESS [  
0.305 s]
        [INFO] Apache Tika font parser module ..................... SUCCESS [  
0.078 s]
        [INFO] Apache Tika XML parser module ...................... SUCCESS [  
0.132 s]
        [INFO] Apache Tika Microsoft parser module ................ SUCCESS [  
2.600 s]
        [INFO] Apache Tika package parser module .................. SUCCESS [  
0.145 s]
        [INFO] Apache Tika PDF parser module ...................... SUCCESS [  
0.667 s]
        [INFO] Apache Tika Apple parser module .................... SUCCESS [  
0.216 s]
        [INFO] Apache Tika cad parser module ...................... SUCCESS [  
0.203 s]
        [INFO] Apache Tika mail parser module ..................... SUCCESS [  
0.187 s]
        [INFO] Apache Tika miscellaneous office format parser module SUCCESS [  
0.421 s]
        [INFO] Apache Tika news parser module ..................... SUCCESS [  
0.163 s]
        [INFO] Apache Tika crypto parser module ................... SUCCESS [  
0.106 s]
        [INFO] Apache Tika WARC parser module ..................... SUCCESS [  
0.104 s]
        [INFO] Apache Tika standard parser package ................ SUCCESS [  
0.565 s]
        [INFO] Apache Tika XMP .................................... SUCCESS [  
0.286 s]
        [INFO] Apache Tika language detection ..................... SUCCESS [  
0.021 s]
        [INFO] Apache Tika langdetect test commons ................ SUCCESS [  
0.057 s]
        [INFO] Apache Tika Optimaize langdetect ................... SUCCESS [  
0.108 s]
        [INFO] Apache Tika OpenNLP langdetect ..................... SUCCESS [  
0.114 s]
        [INFO] Apache Tika pipes .................................. SUCCESS [  
0.018 s]
        [INFO] Apache Tika emitters ............................... SUCCESS [  
0.017 s]
        [INFO] Apache Tika filesystem emitter ..................... SUCCESS [  
0.065 s]
        [INFO] Apache Tika translate .............................. SUCCESS [  
0.446 s]
        [INFO] Apache Tika server module .......................... SUCCESS [  
0.019 s]
        [INFO] Apache Tika server core ............................ FAILURE [  
0.112 s]
        [INFO] Apache Tika standard server ........................ SKIPPED
        [INFO] 
------------------------------------------------------------------------
        [INFO] BUILD FAILURE
        [INFO] 
------------------------------------------------------------------------
        [INFO] Total time:  16.545 s
        [INFO] Finished at: 2022-07-17T09:41:53-04:00
        [INFO] 
------------------------------------------------------------------------
        [ERROR] Failed to execute goal 
org.sonatype.ossindex.maven:ossindex-maven-plugin:3.2.0:audit 
(audit-dependencies) on project tika-server-core: Detected 2 vulnerable 
components:
        [ERROR]   org.eclipse.jetty:jetty-server:jar:9.4.46.v20220331:compile; 
https://ossindex.sonatype.org/component/pkg:maven/org.eclipse.jetty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
        [ERROR]     * [CVE-2022-2047] CWE-20: Improper Input Validation (2.7); 
https://ossindex.sonatype.org/vulnerability/CVE-2022-2047?component-type=maven&component-name=org.eclipse.jetty%2Fjetty-server&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
        [ERROR]   org.eclipse.jetty:jetty-http:jar:9.4.46.v20220331:compile; 
https://ossindex.sonatype.org/component/pkg:maven/org.eclipse.jetty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
        [ERROR]     * [CVE-2022-2047] CWE-20: Improper Input Validation (2.7); 
https://ossindex.sonatype.org/vulnerability/CVE-2022-2047?component-type=maven&component-name=org.eclipse.jetty%2Fjetty-http&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
        [ERROR]
        [ERROR] Excluded coordinates:
        [ERROR]   - com.google.guava:guava:31.1-jre
        [ERROR]
        [ERROR] -> [Help 1]
        org.apache.maven.lifecycle.LifecycleExecutionException: Failed to 
execute goal org.sonatype.ossindex.maven:ossindex-maven-plugin:3.2.0:audit 
(audit-dependencies) on project tika-server-core: Detected 2 vulnerable 
components:
          org.eclipse.jetty:jetty-server:jar:9.4.46.v20220331:compile; 
https://ossindex.sonatype.org/component/pkg:maven/org.eclipse.jetty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
            * [CVE-2022-2047] CWE-20: Improper Input Validation (2.7); 
https://ossindex.sonatype.org/vulnerability/CVE-2022-2047?component-type=maven&component-name=org.eclipse.jetty%2Fjetty-server&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
          org.eclipse.jetty:jetty-http:jar:9.4.46.v20220331:compile; 
https://ossindex.sonatype.org/component/pkg:maven/org.eclipse.jetty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
            * [CVE-2022-2047] CWE-20: Improper Input Validation (2.7); 
https://ossindex.sonatype.org/vulnerability/CVE-2022-2047?component-type=maven&component-name=org.eclipse.jetty%2Fjetty-http&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1

        Excluded coordinates:
          - com.google.guava:guava:31.1-jre

            at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:215)
            at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:156)
            at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:148)
            at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:117)
            at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:81)
            at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
 (SingleThreadedBuilder.java:56)
            at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
(LifecycleStarter.java:128)
            at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
            at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
            at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
            at org.apache.maven.cli.MavenCli.execute (MavenCli.java:972)
            at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293)
            at org.apache.maven.cli.MavenCli.main (MavenCli.java:196)
            at jdk.internal.reflect.DirectMethodHandleAccessor.invoke 
(DirectMethodHandleAccessor.java:104)
            at java.lang.reflect.Method.invoke (Method.java:577)
            at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
(Launcher.java:282)
            at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
(Launcher.java:225)
            at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
(Launcher.java:406)
            at org.codehaus.plexus.classworlds.launcher.Launcher.main 
(Launcher.java:347)
        Caused by: org.apache.maven.plugin.MojoFailureException: Detected 2 
vulnerable components:
          org.eclipse.jetty:jetty-server:jar:9.4.46.v20220331:compile; 
https://ossindex.sonatype.org/component/pkg:maven/org.eclipse.jetty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
            * [CVE-2022-2047] CWE-20: Improper Input Validation (2.7); 
https://ossindex.sonatype.org/vulnerability/CVE-2022-2047?component-type=maven&component-name=org.eclipse.jetty%2Fjetty-server&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
          org.eclipse.jetty:jetty-http:jar:9.4.46.v20220331:compile; 
https://ossindex.sonatype.org/component/pkg:maven/org.eclipse.jetty/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
            * [CVE-2022-2047] CWE-20: Improper Input Validation (2.7); 
https://ossindex.sonatype.org/vulnerability/CVE-2022-2047?component-type=maven&component-name=org.eclipse.jetty%2Fjetty-http&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1

        Excluded coordinates:
          - com.google.guava:guava:31.1-jre

            at org.sonatype.ossindex.maven.plugin.AuditMojoSupport.execute 
(AuditMojoSupport.java:257)
            at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
(DefaultBuildPluginManager.java:137)
            at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:210)
            at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:156)
            at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:148)
            at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:117)
            at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:81)
            at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
 (SingleThreadedBuilder.java:56)
            at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
(LifecycleStarter.java:128)
            at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
            at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
            at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
            at org.apache.maven.cli.MavenCli.execute (MavenCli.java:972)
            at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293)
            at org.apache.maven.cli.MavenCli.main (MavenCli.java:196)
            at jdk.internal.reflect.DirectMethodHandleAccessor.invoke 
(DirectMethodHandleAccessor.java:104)
            at java.lang.reflect.Method.invoke (Method.java:577)
            at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
(Launcher.java:282)
            at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
(Launcher.java:225)
            at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
(Launcher.java:406)
            at org.codehaus.plexus.classworlds.launcher.Launcher.main 
(Launcher.java:347)
        [ERROR]
        [ERROR]
        [ERROR] For more information about the errors and possible solutions, 
please read the following articles:
        [ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
        [ERROR]
        [ERROR] After correcting the problems, you can resume the build with 
the command
        [ERROR]   mvn <args> -rf :tika-server-core

checking @

        https://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

                "Unlike many other errors, this exception is not generated by the 
Maven core itself but by a plugin. As a rule of thumb, plugins use this error to signal a 
failure of the build because there is something wrong with the dependencies or sources of 
a project, e.g. a compilation or a test failure."

in /tmp

immediately after tika-server start

        '/usr/bin/tree -Csup --timefmt "%F %R:%S %z"' /tmp | grep tika
                ├── [-rw------- tika               0 2022-07-17 09:54:08 -0400] 
 apache-tika-server-forked-tmp-16337036696243797817
                ├── [drwxr-xr-x tika              80 2022-07-17 09:54:08 -0400] 
 hsperfdata_tika
                │   ├── [-rw------- tika           32768 2022-07-17 09:54:04 
-0400]  15865
                │   └── [-rw------- tika           32768 2022-07-17 09:54:08 
-0400]  15902

, and, same -- i.e. nothing added -- after receipt of email with failed tika 
scan/parse

anyone have some explicit instructions for setting a catchable breakpoint in a 
jdb -attach to tika-server?
or, error-free build instructions?

Reply via email to