[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079164#comment-16079164 ] Gus Heck commented on TIKA-1367: Update... I did start making test cases, but this may have been a matter of my artifactory instance behaving strangely. I'm seeing different results with mavenCentral() directly... it was supposedly proxying central transparently, but perhaps not quite. > Tika documentation should list tika-parsers parser dependencies > --- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Sergey Beryozkin > Fix For: 1.17 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076440#comment-16076440 ] Gus Heck commented on TIKA-1367: I was using gradle, not maven, and I am assuming that gradle's dependency task (which should operate the same as maven's) does not have a bug. You can see the pasted output above... I'll do a more rigorous test, creating and packing actual shade/shadow jars and provide example projects in maven and gradle (or alternately I'll discover what it is I'm not understanding), hopefully this evening or tomorrow. > Tika documentation should list tika-parsers parser dependencies > --- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Sergey Beryozkin > Fix For: 1.16 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076162#comment-16076162 ] Andreas Hubold commented on TIKA-1367: -- Yes, true. I was referring to [~gus_heck]'s comment, where he wrote that dependencies were completely lost in Maven when upgrading from 1.12 to 1.15. > Tika documentation should list tika-parsers parser dependencies > --- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Sergey Beryozkin > Fix For: 1.16 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076150#comment-16076150 ] Manfred Schenk commented on TIKA-1367: -- If I understood the original ticket correctly, the problem was not to have access to the dependency:tree (which can be easily created via the well known tools) but to have some documentation which dependencies are absolutely required and which are just needed for certaĆn functionality. Since tika has a huge number of dependencies it is very likely that there will be some version conflicts in real-world-projects. If some of there dependencies would be documented as (i.e.: "optional, just needed for accessing the content of files of type xyz") the developer could decide more easy to exclude them from the dependency tree and resign supporting type xyz. > Tika documentation should list tika-parsers parser dependencies > --- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Sergey Beryozkin > Fix For: 1.16 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076137#comment-16076137 ] Andreas Hubold commented on TIKA-1367: -- mvn dependency:tree lists the dependencies of tika-parsers 1.15 for me. They are also correctly listed in the pom available on Maven central: http://central.maven.org/maven2/org/apache/tika/tika-parsers/1.15/tika-parsers-1.15.pom Maybe you've somehow installed a wrong pom into your maven repository? > Tika documentation should list tika-parsers parser dependencies > --- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Sergey Beryozkin > Fix For: 1.16 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075614#comment-16075614 ] Gus Heck commented on TIKA-1367: Maven Shade, Gradle Shadow and OneJar plugins for both Maven and Gradle all rely on the dependency graph which shows as empty in my output from Gradle above, so I don't expect it to matter which one I use. I'll try to post minimal examples of each. I'll look at the App jar, but that sounds like it will require unpacking into a local temp dir, and some horrible amount of custom dependency definition omitting group names to account for the lack of a maven/ivy structure in the directory to which it is unpacked... I can of course just manually list all of tika's deps, (after I figure out what they are by reading the build in detail)... but that's horribly brittle, error prone and tedious, and more or less a key reason maven exists in the first place. > Tika documentation should list tika-parsers parser dependencies > --- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Sergey Beryozkin > Fix For: 1.16 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074648#comment-16074648 ] Nick Burch commented on TIKA-1367: -- [~talli...@mitre.org] I'm not sure there is - we've fixed it in Tika 2.0, and just need to get the other breaking 2.x changes done so we can release The "easy but hacky" way to get all of Tika in a jar is to just use the Tika App jar. The Tika Bundle ships all the dependencies, and thus provides a single jar with everything in, for those who use OSGi Otherwise, IIRC the Maven Shade plugin will give you a single jar with all deps inlined if you ask it to based on the Tika Parser dependency (that's how the Tika App build works), or for Gradle users I believe that the shadowJar plugin is the currently recommended way to do the same > Tika documentation should list tika-parsers parser dependencies > --- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Sergey Beryozkin > Fix For: 1.16 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074615#comment-16074615 ] Tim Allison commented on TIKA-1367: --- Fellow devs, is this something we need to fix for 1.16? > Tika documentation should list tika-parsers parser dependencies > --- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Sergey Beryozkin > Fix For: 1.16 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074179#comment-16074179 ] Gus Heck commented on TIKA-1367: Compilation seems to be solved by using core directly, but still no deps... {code} compile - Dependencies for source set 'main'. +--- org.apache.tika:tika-core:1.15 +--- org.apache.solr:solr-solrj:5.5.0 |+--- commons-io:commons-io:2.4 {code} > Tika documentation should list tika-parsers parser dependencies > --- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Sergey Beryozkin > Fix For: 1.16 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074175#comment-16074175 ] Gus Heck commented on TIKA-1367: So when the dust settles here, how will one build a coherent, workable one-jar application that supports code like this that intends to make a best effort to parse any document that might be encountered: {code} Tika tika = new Tika(); tika.setMaxStringLength(document.getRawData().length); Metadata metadata = new Metadata(); try (ByteArrayInputStream bais = new ByteArrayInputStream(rawData)) { String textContent = tika.parseToString(bais, metadata); document.setRawData(textContent.getBytes(Charset.forName("UTF-8"))); for (String name : metadata.names()) { document.put(sanitize(name) + plusSuffix(), metadata.get(name)); } } catch (IOException | TikaException e) { log.warn("Tika processing failure!", e); // if tika can't parse it we certainly don't want random binary crap in the index document.setStatus(Status.DROPPED); } {code} Although I notice that this is not marked as fixed yet, in 1.15, the above code no-longer compiles... (and somehow there are no dependencies reported by gradle...) {code} compile - Dependencies for source set 'main'. +--- org.apache.tika:tika-parsers:1.15 +--- org.apache.solr:solr-solrj:5.5.0 |+--- commons-io:commons-io:2.4 {code} vs {code} +--- org.apache.tika:tika-parsers:1.12 |+--- org.apache.tika:tika-core:1.12 |+--- org.gagravarr:vorbis-java-tika:0.6 ||\--- org.apache.tika:tika-core:1.5 -> 1.12 |+--- com.healthmarketscience.jackcess:jackcess:2.1.2 {code} Which seems very much like it's totally going to break everything... if gradle doesn't see the deps, one-jar won't package them (all I did was change a 1.12 to a 1.15 in the gradle build to cause this) > Tika documentation should list tika-parsers parser dependencies > --- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Sergey Beryozkin > Fix For: 1.16 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409311#comment-15409311 ] Nick Burch commented on TIKA-1367: -- The code exists and you can check out the more modular parsers already, see http://wiki.apache.org/tika/Tika2_0RoadMap . The main blocker still is coming up with a way to reset or change the sax stream when multiple parsers are used for one document. That needs solving before we can finalise the 2.x API > Tika documentation should list tika-parsers parser dependencies > --- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Sergey Beryozkin > Fix For: 1.14 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409282#comment-15409282 ] Manfred Schenk commented on TIKA-1367: -- Is there a roadmap for version 2.0? > Tika documentation should list tika-parsers parser dependencies > --- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Sergey Beryozkin > Fix For: 1.14 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409272#comment-15409272 ] Nick Burch commented on TIKA-1367: -- This should be largely fixed on the 2.x branch, which has more modular parsers. That lets you see which dependencies are needed for which logical groupings of parsers Given that, and given the refactoring needed to support this, I'm minded to say we won't fix it on 1.x and you'll need to move to 2.x if you need that detail of information > Tika documentation should list tika-parsers parser dependencies > --- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Sergey Beryozkin > Fix For: 1.14 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409222#comment-15409222 ] Manfred Schenk commented on TIKA-1367: -- Definitely, a lot more detailed information about the dependencies is needed for tika-parsers. This is the current list of dependencies which is pulled by gradle when using tika-parsers: +--- org.apache.tika:tika-parsers:1.13 |+--- org.apache.tika:tika-core:1.13 |+--- org.gagravarr:vorbis-java-tika:0.8 ||\--- org.apache.tika:tika-core:1.12 -> 1.13 |+--- com.healthmarketscience.jackcess:jackcess:2.1.3 ||+--- commons-lang:commons-lang:2.6 ||\--- commons-logging:commons-logging:1.1.3 -> 1.2 |+--- com.healthmarketscience.jackcess:jackcess-encrypt:2.1.1 ||\--- com.healthmarketscience.jackcess:jackcess:2.1.0 -> 2.1.3 (*) |+--- net.sourceforge.jmatio:jmatio:1.0 |+--- org.apache.james:apache-mime4j-core:0.7.2 |+--- org.apache.james:apache-mime4j-dom:0.7.2 ||\--- org.apache.james:apache-mime4j-core:0.7.2 |+--- org.apache.commons:commons-compress:1.11 |+--- org.tukaani:xz:1.5 |+--- commons-codec:commons-codec:1.10 |+--- org.apache.pdfbox:pdfbox:2.0.1 ||+--- org.apache.pdfbox:fontbox:2.0.1 |||\--- commons-logging:commons-logging:1.2 ||\--- commons-logging:commons-logging:1.2 |+--- org.apache.pdfbox:pdfbox-tools:2.0.1 ||\--- org.apache.pdfbox:pdfbox-debugger:2.0.1 || \--- org.apache.pdfbox:pdfbox:2.0.1 (*) |+--- org.apache.pdfbox:jempbox:1.8.12 |+--- org.bouncycastle:bcmail-jdk15on:1.54 ||+--- org.bouncycastle:bcprov-jdk15on:1.54 ||\--- org.bouncycastle:bcpkix-jdk15on:1.54 || \--- org.bouncycastle:bcprov-jdk15on:1.54 |+--- org.bouncycastle:bcprov-jdk15on:1.54 |+--- org.apache.poi:poi:3.15-beta1 ||\--- commons-codec:commons-codec:1.10 |+--- org.apache.poi:poi-scratchpad:3.15-beta1 ||\--- org.apache.poi:poi:3.15-beta1 (*) |+--- org.apache.poi:poi-ooxml:3.15-beta1 ||+--- org.apache.poi:poi:3.15-beta1 (*) ||+--- org.apache.poi:poi-ooxml-schemas:3.15-beta1 |||\--- org.apache.xmlbeans:xmlbeans:2.6.0 ||\--- com.github.virtuald:curvesapi:1.03 |+--- org.ccil.cowan.tagsoup:tagsoup:1.2.1 |+--- org.ow2.asm:asm:5.0.4 |+--- com.googlecode.mp4parser:isoparser:1.1.18 |+--- com.drewnoakes:metadata-extractor:2.8.1 (*) |+--- de.l3s.boilerpipe:boilerpipe:1.1.0 |+--- com.rometools:rome:1.5.1 ||+--- com.rometools:rome-utils:1.5.1 ||\--- org.slf4j:slf4j-api:1.7.7 -> 1.7.14 |+--- org.gagravarr:vorbis-java-core:0.8 |+--- com.googlecode.juniversalchardet:juniversalchardet:1.0.3 |+--- org.codelibs:jhighlight:1.0.2 |+--- com.pff:java-libpst:0.8.1 |+--- com.github.junrar:junrar:0.7 ||+--- commons-logging:commons-logging-api:1.1 ||\--- org.apache.commons:commons-vfs2:2.0 || +--- commons-logging:commons-logging:1.1.1 -> 1.2 || +--- org.apache.maven.scm:maven-scm-api:1.4 || |\--- org.codehaus.plexus:plexus-utils:1.5.6 || \--- org.apache.maven.scm:maven-scm-provider-svnexe:1.4 || +--- org.apache.maven.scm:maven-scm-provider-svn-commons:1.4 || |+--- org.apache.maven.scm:maven-scm-api:1.4 (*) || |\--- org.codehaus.plexus:plexus-utils:1.5.6 || +--- regexp:regexp:1.3 || +--- org.apache.maven.scm:maven-scm-api:1.4 (*) || \--- org.codehaus.plexus:plexus-utils:1.5.6 |+--- org.apache.cxf:cxf-rt-rs-client:3.0.3 ||+--- org.apache.cxf:cxf-rt-transports-http:3.0.3 |||\--- org.apache.cxf:cxf-core:3.0.3 ||| +--- org.codehaus.woodstox:woodstox-core-asl:4.4.1 ||| |\--- org.codehaus.woodstox:stax2-api:3.1.4 ||| \--- org.apache.ws.xmlschema:xmlschema-core:2.1.0 ||+--- org.apache.cxf:cxf-core:3.0.3 (*) ||\--- org.apache.cxf:cxf-rt-frontend-jaxrs:3.0.3 || +--- org.apache.cxf:cxf-core:3.0.3 (*) || +--- javax.ws.rs:javax.ws.rs-api:2.0.1 || +--- javax.annotation:javax.annotation-api:1.2 || \--- org.apache.cxf:cxf-rt-transports-http:3.0.3 (*) |+--- org.apache.opennlp:opennlp-tools:1.5.3 ||+--- org.apache.opennlp:opennlp-maxent:3.0.3 ||\--- net.sf.jwordnet:jwnl:1.3.3 |+--- commons-io:commons-io:2.4 |+--- org.apache.commons:commons-exec:1.3 |+--- com.googlecode.json-simple:json-simple:1.1.1 |+--- org.json:json:20140107 |+--- com.google.code.gson:gson:2.2.4 -> 2.4 |+--- edu.ucar:netcdf4:4.5.5 ||+--- edu.ucar:cdm:4.5.5 |||+--- edu.ucar:udunits:4.5.5 ||||+--- joda-time:joda-time:2.2 ||||\--- net.jcip:jcip-annotations:1.0 |||+---
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372123#comment-14372123 ] Tyler Palsulich commented on TIKA-1367: --- This is still worth doing, but it needs to be better than the dependency tree idea I gave above. Still not sure about a good solution. Should this be a page on the website? Tika documentation should list tika-parsers parser dependencies --- Key: TIKA-1367 URL: https://issues.apache.org/jira/browse/TIKA-1367 Project: Tika Issue Type: Improvement Components: documentation Reporter: Sergey Beryozkin Fix For: 1.8 tika-parsers module has many strong transitive parser dependencies. Maven users of tika-parsers have to exclude all the transitivie dependencies manually. Documenting the list of the existing transitive dependencies and keeping the list up to date will help developers exclude the libraries not needed for a given project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062188#comment-14062188 ] Tyler Palsulich commented on TIKA-1367: --- I think that letting users know just how big the tika-parsers dependency tree is a good idea. But, why not just point them to {{cd tika-parsers mvn dependency:tree | grep compile}}? That way, we don't have to manually update the list. On the other hand, similar to TIKA-411 (list of supported document formats), we could include the dependency tree on the website, under each release. People using tika-parsers probably don't check out the Tika source, so they wouldn't be able to run the {{dependency:tree}} command themselves. Tika documentation should list tika-parsers parser dependencies --- Key: TIKA-1367 URL: https://issues.apache.org/jira/browse/TIKA-1367 Project: Tika Issue Type: Improvement Components: documentation Reporter: Sergey Beryozkin Fix For: 1.6 tika-parsers module has many strong transitive parser dependencies. Maven users of tika-parsers have to exclude all the transitivie dependencies manually. Documenting the list of the existing transitive dependencies and keeping the list up to date will help developers exclude the libraries not needed for a given project. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062606#comment-14062606 ] Sergey Beryozkin commented on TIKA-1367: Thanks for the proposal, I'm not sure though it would help. Consider we have a user not necessarily knowing what 'grep' is, for example someone working on Windows. Ideally as a user I'd like to have an easy way to solve this typical dependency issue: My application will work with PDFs and OpenDocument docs only, how can I get all but the relevant dependencies excluded ?. I know some source and Maven based search can yield some info, but it would not something every user can be expected be able to do. For the record, here's what I see after grepping dependency:tree {noformat} [INFO] +- org.apache.tika:tika-core:jar:1.6-SNAPSHOT:compile [INFO] +- org.gagravarr:vorbis-java-tika:jar:0.6:compile [INFO] +- edu.ucar:netcdf:jar:4.2.20:compile [INFO] | +- edu.ucar:unidataCommon:jar:4.2.20:compile [INFO] | | \- net.jcip:jcip-annotations:jar:1.0:compile [INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | \- org.slf4j:slf4j-api:jar:1.6.1:compile [INFO] +- net.sourceforge.jmatio:jmatio:jar:1.0:compile [INFO] +- org.apache.james:apache-mime4j-core:jar:0.7.2:compile [INFO] +- org.apache.james:apache-mime4j-dom:jar:0.7.2:compile [INFO] +- org.apache.commons:commons-compress:jar:1.8:compile [INFO] | \- org.tukaani:xz:jar:1.5:compile [INFO] +- commons-codec:commons-codec:jar:1.5:compile [INFO] +- org.apache.pdfbox:pdfbox:jar:1.8.6:compile [INFO] | +- org.apache.pdfbox:fontbox:jar:1.8.6:compile [INFO] | +- org.apache.pdfbox:jempbox:jar:1.8.6:compile [INFO] | \- commons-logging:commons-logging:jar:1.1.1:compile [INFO] +- org.bouncycastle:bcmail-jdk15:jar:1.45:compile [INFO] +- org.bouncycastle:bcprov-jdk15:jar:1.45:compile [INFO] +- org.apache.poi:poi:jar:3.10-FINAL:compile [INFO] +- org.apache.poi:poi-scratchpad:jar:3.10-FINAL:compile [INFO] +- org.apache.poi:poi-ooxml:jar:3.10-FINAL:compile [INFO] | +- org.apache.poi:poi-ooxml-schemas:jar:3.10-FINAL:compile [INFO] | | \- org.apache.xmlbeans:xmlbeans:jar:2.3.0:compile [INFO] | \- dom4j:dom4j:jar:1.6.1:compile [INFO] +- org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile [INFO] +- org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1:compile [INFO] +- org.ow2.asm:asm-debug-all:jar:4.1:compile [INFO] +- com.googlecode.mp4parser:isoparser:jar:1.0-RC-1:compile [INFO] | \- org.aspectj:aspectjrt:jar:1.6.11:compile [INFO] +- com.drewnoakes:metadata-extractor:jar:2.6.2:compile [INFO] | +- com.adobe.xmp:xmpcore:jar:5.1.2:compile [INFO] | \- xerces:xercesImpl:jar:2.8.1:compile [INFO] | \- xml-apis:xml-apis:jar:1.3.03:compile [INFO] +- de.l3s.boilerpipe:boilerpipe:jar:1.1.0:compile [INFO] +- rome:rome:jar:1.0:compile [INFO] | \- jdom:jdom:jar:1.0:compile [INFO] +- org.gagravarr:vorbis-java-core:jar:0.6:compile [INFO] +- com.googlecode.juniversalchardet:juniversalchardet:jar:1.0.3:compile [INFO] +- com.uwyn:jhighlight:jar:1.0:compile [INFO] +- com.pff:java-libpst:jar:0.8.1:compile {noformat} It's a difficult task to start excluding. I've no idea as a user what many of those dependencies are for, and if some of them would be needed by all Parser implementations or not. It's easy enough to spot what PDF Parser will need (pdfbox), but more tricky to see what else might be needed for PDF as well as for other types. Tika documentation should list tika-parsers parser dependencies --- Key: TIKA-1367 URL: https://issues.apache.org/jira/browse/TIKA-1367 Project: Tika Issue Type: Improvement Components: documentation Reporter: Sergey Beryozkin Fix For: 1.6 tika-parsers module has many strong transitive parser dependencies. Maven users of tika-parsers have to exclude all the transitivie dependencies manually. Documenting the list of the existing transitive dependencies and keeping the list up to date will help developers exclude the libraries not needed for a given project. -- This message was sent by Atlassian JIRA (v6.2#6252)