[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2017-07-08 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079164#comment-16079164
 ] 

Gus Heck commented on TIKA-1367:


Update... I did start making test cases, but this may have been a matter of my 
artifactory instance behaving strangely. I'm seeing different results with 
mavenCentral() directly... it was supposedly proxying central transparently, 
but perhaps not quite. 

> Tika documentation should list tika-parsers parser dependencies
> ---
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sergey Beryozkin
> Fix For: 1.17
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2017-07-06 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076440#comment-16076440
 ] 

Gus Heck commented on TIKA-1367:


I was using gradle, not maven, and I am assuming that gradle's dependency task 
(which should operate the same as maven's) does not have a bug. You can see the 
pasted output above... I'll do a more rigorous test, creating and packing 
actual shade/shadow jars and provide example projects in maven and gradle (or 
alternately I'll discover what it is I'm not understanding), hopefully this 
evening or tomorrow.

> Tika documentation should list tika-parsers parser dependencies
> ---
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sergey Beryozkin
> Fix For: 1.16
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2017-07-06 Thread Andreas Hubold (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076162#comment-16076162
 ] 

Andreas Hubold commented on TIKA-1367:
--

Yes, true. I was referring to [~gus_heck]'s comment, where he wrote that 
dependencies were completely lost in Maven when upgrading from 1.12 to 1.15.

> Tika documentation should list tika-parsers parser dependencies
> ---
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sergey Beryozkin
> Fix For: 1.16
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2017-07-06 Thread Manfred Schenk (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076150#comment-16076150
 ] 

Manfred Schenk commented on TIKA-1367:
--

If I understood the original ticket correctly, the problem was not to have 
access to the dependency:tree (which can be easily created via the well known 
tools) but to have some documentation which dependencies are absolutely 
required and which are just needed for certaĆ­n functionality. Since tika has a 
huge number of dependencies it is very likely that there will be some version 
conflicts in real-world-projects. If some of there dependencies would be 
documented as (i.e.: "optional, just needed for accessing the content of files 
of type xyz") the developer could decide more easy to exclude them from the 
dependency tree and resign supporting type xyz.

> Tika documentation should list tika-parsers parser dependencies
> ---
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sergey Beryozkin
> Fix For: 1.16
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2017-07-06 Thread Andreas Hubold (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076137#comment-16076137
 ] 

Andreas Hubold commented on TIKA-1367:
--

mvn dependency:tree lists the dependencies of tika-parsers 1.15 for me. They 
are also correctly listed in the pom available on Maven central: 
http://central.maven.org/maven2/org/apache/tika/tika-parsers/1.15/tika-parsers-1.15.pom
Maybe you've somehow installed a wrong pom into your maven repository?

> Tika documentation should list tika-parsers parser dependencies
> ---
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sergey Beryozkin
> Fix For: 1.16
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2017-07-05 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075614#comment-16075614
 ] 

Gus Heck commented on TIKA-1367:


Maven Shade,  Gradle Shadow and OneJar plugins for both Maven and Gradle all 
rely on the dependency graph which shows as empty in my output from Gradle 
above, so I don't expect it to matter which one I use. I'll try to post minimal 
examples of each.

I'll look at the App jar, but that sounds like it will require unpacking into a 
local temp dir, and some horrible amount of custom dependency definition 
omitting group names to account for the lack of a maven/ivy structure in the 
directory to which it is unpacked... 

I can of course just manually list all of tika's deps, (after I figure out what 
they are by reading the build in detail)... but that's horribly brittle, error 
prone and tedious, and more or less a key reason maven exists in the first 
place.

> Tika documentation should list tika-parsers parser dependencies
> ---
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sergey Beryozkin
> Fix For: 1.16
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2017-07-05 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074648#comment-16074648
 ] 

Nick Burch commented on TIKA-1367:
--

[~talli...@mitre.org] I'm not sure there is - we've fixed it in Tika 2.0, and 
just need to get the other breaking 2.x changes done so we can release

The "easy but hacky" way to get all of Tika in a jar is to just use the Tika 
App jar. The Tika Bundle ships all the dependencies, and thus provides a single 
jar with everything in, for those who use OSGi

Otherwise, IIRC the Maven Shade plugin will give you a single jar with all deps 
inlined if you ask it to based on the Tika Parser dependency (that's how the 
Tika App build works), or for Gradle users I believe that the shadowJar plugin 
is the currently recommended way to do the same

> Tika documentation should list tika-parsers parser dependencies
> ---
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sergey Beryozkin
> Fix For: 1.16
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2017-07-05 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074615#comment-16074615
 ] 

Tim Allison commented on TIKA-1367:
---

Fellow devs, is this something we need to fix for 1.16?

> Tika documentation should list tika-parsers parser dependencies
> ---
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sergey Beryozkin
> Fix For: 1.16
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2017-07-04 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074179#comment-16074179
 ] 

Gus Heck commented on TIKA-1367:


Compilation seems to be solved by using core directly, but still no deps...
{code}
compile - Dependencies for source set 'main'.
+--- org.apache.tika:tika-core:1.15
+--- org.apache.solr:solr-solrj:5.5.0
|+--- commons-io:commons-io:2.4
{code}

> Tika documentation should list tika-parsers parser dependencies
> ---
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sergey Beryozkin
> Fix For: 1.16
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2017-07-04 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074175#comment-16074175
 ] 

Gus Heck commented on TIKA-1367:


So when the dust settles here, how will one build a coherent, workable one-jar 
application that supports code like this that intends to make a best effort to 
parse any document that might be encountered:

{code}
  Tika tika = new Tika();
  tika.setMaxStringLength(document.getRawData().length);
  Metadata metadata = new Metadata();
  try (ByteArrayInputStream bais = new ByteArrayInputStream(rawData)) {
String textContent = tika.parseToString(bais, metadata);
document.setRawData(textContent.getBytes(Charset.forName("UTF-8")));
for (String name : metadata.names()) {
  document.put(sanitize(name) + plusSuffix(), metadata.get(name));
}
  } catch (IOException | TikaException e) {
log.warn("Tika processing failure!", e);
// if tika can't parse it we certainly don't want random binary crap in 
the index
document.setStatus(Status.DROPPED);
  }
{code}

Although I notice that this is not marked as fixed yet, in 1.15, the above code 
no-longer compiles... (and somehow there are no dependencies reported by 
gradle...)
{code}

compile - Dependencies for source set 'main'.
+--- org.apache.tika:tika-parsers:1.15
+--- org.apache.solr:solr-solrj:5.5.0
|+--- commons-io:commons-io:2.4
{code}
vs
{code}
+--- org.apache.tika:tika-parsers:1.12
|+--- org.apache.tika:tika-core:1.12
|+--- org.gagravarr:vorbis-java-tika:0.6
||\--- org.apache.tika:tika-core:1.5 -> 1.12
|+--- com.healthmarketscience.jackcess:jackcess:2.1.2
{code}

Which seems very much like it's totally going to break everything... if gradle 
doesn't see the deps, one-jar won't package them (all I did was change a 1.12 
to a 1.15 in the gradle build to cause this)

> Tika documentation should list tika-parsers parser dependencies
> ---
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sergey Beryozkin
> Fix For: 1.16
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2016-08-05 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409311#comment-15409311
 ] 

Nick Burch commented on TIKA-1367:
--

The code exists and you can check out the more modular parsers already, see 
http://wiki.apache.org/tika/Tika2_0RoadMap . The main blocker still is coming 
up with a way to reset or change the sax stream when multiple parsers are used 
for one document. That needs solving before we can finalise the 2.x API

> Tika documentation should list tika-parsers parser dependencies
> ---
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sergey Beryozkin
> Fix For: 1.14
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2016-08-05 Thread Manfred Schenk (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409282#comment-15409282
 ] 

Manfred Schenk commented on TIKA-1367:
--

Is there a roadmap for version 2.0?

> Tika documentation should list tika-parsers parser dependencies
> ---
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sergey Beryozkin
> Fix For: 1.14
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2016-08-05 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409272#comment-15409272
 ] 

Nick Burch commented on TIKA-1367:
--

This should be largely fixed on the 2.x branch, which has more modular parsers. 
That lets you see which dependencies are needed for which logical groupings of 
parsers

Given that, and given the refactoring needed to support this, I'm minded to say 
we won't fix it on 1.x and you'll need to move to 2.x if you need that detail 
of information

> Tika documentation should list tika-parsers parser dependencies
> ---
>
> Key: TIKA-1367
> URL: https://issues.apache.org/jira/browse/TIKA-1367
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sergey Beryozkin
> Fix For: 1.14
>
>
> tika-parsers module has many strong transitive parser dependencies. Maven 
> users of tika-parsers have to exclude all the transitivie dependencies 
> manually. Documenting the list of the existing transitive dependencies and 
> keeping the list up to date will help developers exclude the libraries not 
> needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2016-08-05 Thread Manfred Schenk (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409222#comment-15409222
 ] 

Manfred Schenk commented on TIKA-1367:
--

Definitely, a lot more detailed information about the dependencies is needed 
for tika-parsers.

This is the current list of dependencies which is pulled by gradle when using 
tika-parsers:

+--- org.apache.tika:tika-parsers:1.13
|+--- org.apache.tika:tika-core:1.13
|+--- org.gagravarr:vorbis-java-tika:0.8
||\--- org.apache.tika:tika-core:1.12 -> 1.13
|+--- com.healthmarketscience.jackcess:jackcess:2.1.3
||+--- commons-lang:commons-lang:2.6
||\--- commons-logging:commons-logging:1.1.3 -> 1.2
|+--- com.healthmarketscience.jackcess:jackcess-encrypt:2.1.1
||\--- com.healthmarketscience.jackcess:jackcess:2.1.0 -> 2.1.3 (*)
|+--- net.sourceforge.jmatio:jmatio:1.0
|+--- org.apache.james:apache-mime4j-core:0.7.2
|+--- org.apache.james:apache-mime4j-dom:0.7.2
||\--- org.apache.james:apache-mime4j-core:0.7.2
|+--- org.apache.commons:commons-compress:1.11
|+--- org.tukaani:xz:1.5
|+--- commons-codec:commons-codec:1.10
|+--- org.apache.pdfbox:pdfbox:2.0.1
||+--- org.apache.pdfbox:fontbox:2.0.1
|||\--- commons-logging:commons-logging:1.2
||\--- commons-logging:commons-logging:1.2
|+--- org.apache.pdfbox:pdfbox-tools:2.0.1
||\--- org.apache.pdfbox:pdfbox-debugger:2.0.1
|| \--- org.apache.pdfbox:pdfbox:2.0.1 (*)
|+--- org.apache.pdfbox:jempbox:1.8.12
|+--- org.bouncycastle:bcmail-jdk15on:1.54
||+--- org.bouncycastle:bcprov-jdk15on:1.54
||\--- org.bouncycastle:bcpkix-jdk15on:1.54
|| \--- org.bouncycastle:bcprov-jdk15on:1.54
|+--- org.bouncycastle:bcprov-jdk15on:1.54
|+--- org.apache.poi:poi:3.15-beta1
||\--- commons-codec:commons-codec:1.10
|+--- org.apache.poi:poi-scratchpad:3.15-beta1
||\--- org.apache.poi:poi:3.15-beta1 (*)
|+--- org.apache.poi:poi-ooxml:3.15-beta1
||+--- org.apache.poi:poi:3.15-beta1 (*)
||+--- org.apache.poi:poi-ooxml-schemas:3.15-beta1
|||\--- org.apache.xmlbeans:xmlbeans:2.6.0
||\--- com.github.virtuald:curvesapi:1.03
|+--- org.ccil.cowan.tagsoup:tagsoup:1.2.1
|+--- org.ow2.asm:asm:5.0.4
|+--- com.googlecode.mp4parser:isoparser:1.1.18
|+--- com.drewnoakes:metadata-extractor:2.8.1 (*)
|+--- de.l3s.boilerpipe:boilerpipe:1.1.0
|+--- com.rometools:rome:1.5.1
||+--- com.rometools:rome-utils:1.5.1
||\--- org.slf4j:slf4j-api:1.7.7 -> 1.7.14
|+--- org.gagravarr:vorbis-java-core:0.8
|+--- com.googlecode.juniversalchardet:juniversalchardet:1.0.3
|+--- org.codelibs:jhighlight:1.0.2
|+--- com.pff:java-libpst:0.8.1
|+--- com.github.junrar:junrar:0.7
||+--- commons-logging:commons-logging-api:1.1
||\--- org.apache.commons:commons-vfs2:2.0
|| +--- commons-logging:commons-logging:1.1.1 -> 1.2
|| +--- org.apache.maven.scm:maven-scm-api:1.4
|| |\--- org.codehaus.plexus:plexus-utils:1.5.6
|| \--- org.apache.maven.scm:maven-scm-provider-svnexe:1.4
||  +--- org.apache.maven.scm:maven-scm-provider-svn-commons:1.4
||  |+--- org.apache.maven.scm:maven-scm-api:1.4 (*)
||  |\--- org.codehaus.plexus:plexus-utils:1.5.6
||  +--- regexp:regexp:1.3
||  +--- org.apache.maven.scm:maven-scm-api:1.4 (*)
||  \--- org.codehaus.plexus:plexus-utils:1.5.6
|+--- org.apache.cxf:cxf-rt-rs-client:3.0.3
||+--- org.apache.cxf:cxf-rt-transports-http:3.0.3
|||\--- org.apache.cxf:cxf-core:3.0.3
||| +--- org.codehaus.woodstox:woodstox-core-asl:4.4.1
||| |\--- org.codehaus.woodstox:stax2-api:3.1.4
||| \--- org.apache.ws.xmlschema:xmlschema-core:2.1.0
||+--- org.apache.cxf:cxf-core:3.0.3 (*)
||\--- org.apache.cxf:cxf-rt-frontend-jaxrs:3.0.3
|| +--- org.apache.cxf:cxf-core:3.0.3 (*)
|| +--- javax.ws.rs:javax.ws.rs-api:2.0.1
|| +--- javax.annotation:javax.annotation-api:1.2
|| \--- org.apache.cxf:cxf-rt-transports-http:3.0.3 (*)
|+--- org.apache.opennlp:opennlp-tools:1.5.3
||+--- org.apache.opennlp:opennlp-maxent:3.0.3
||\--- net.sf.jwordnet:jwnl:1.3.3
|+--- commons-io:commons-io:2.4
|+--- org.apache.commons:commons-exec:1.3
|+--- com.googlecode.json-simple:json-simple:1.1.1
|+--- org.json:json:20140107
|+--- com.google.code.gson:gson:2.2.4 -> 2.4
|+--- edu.ucar:netcdf4:4.5.5
||+--- edu.ucar:cdm:4.5.5
|||+--- edu.ucar:udunits:4.5.5
||||+--- joda-time:joda-time:2.2
||||\--- net.jcip:jcip-annotations:1.0
|||+--- 

[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2015-03-20 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372123#comment-14372123
 ] 

Tyler Palsulich commented on TIKA-1367:
---

This is still worth doing, but it needs to be better than the dependency tree 
idea I gave above. Still not sure about a good solution. Should this be a page 
on the website?

 Tika documentation should list tika-parsers parser dependencies
 ---

 Key: TIKA-1367
 URL: https://issues.apache.org/jira/browse/TIKA-1367
 Project: Tika
  Issue Type: Improvement
  Components: documentation
Reporter: Sergey Beryozkin
 Fix For: 1.8


 tika-parsers module has many strong transitive parser dependencies. Maven 
 users of tika-parsers have to exclude all the transitivie dependencies 
 manually. Documenting the list of the existing transitive dependencies and 
 keeping the list up to date will help developers exclude the libraries not 
 needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2014-07-15 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062188#comment-14062188
 ] 

Tyler Palsulich commented on TIKA-1367:
---

I think that letting users know just how big the tika-parsers dependency tree 
is a good idea. But, why not just point them to {{cd tika-parsers  mvn 
dependency:tree | grep compile}}? That way, we don't have to manually update 
the list.

On the other hand, similar to TIKA-411 (list of supported document formats), we 
could include the dependency tree on the website, under each release. People 
using tika-parsers probably don't check out the Tika source, so they wouldn't 
be able to run the {{dependency:tree}} command themselves.

 Tika documentation should list tika-parsers parser dependencies
 ---

 Key: TIKA-1367
 URL: https://issues.apache.org/jira/browse/TIKA-1367
 Project: Tika
  Issue Type: Improvement
  Components: documentation
Reporter: Sergey Beryozkin
 Fix For: 1.6


 tika-parsers module has many strong transitive parser dependencies. Maven 
 users of tika-parsers have to exclude all the transitivie dependencies 
 manually. Documenting the list of the existing transitive dependencies and 
 keeping the list up to date will help developers exclude the libraries not 
 needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2014-07-15 Thread Sergey Beryozkin (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062606#comment-14062606
 ] 

Sergey Beryozkin commented on TIKA-1367:


Thanks for the proposal, I'm not sure though it would help. Consider we have a 
user not necessarily knowing what 'grep' is, for example someone working on 
Windows. Ideally as a user I'd like to have an easy way to solve this typical 
dependency issue: My application will work with PDFs and OpenDocument docs 
only, how can I get all but the relevant dependencies excluded ?. I know some 
source and Maven based search can yield some info, but it would not something 
every user can be expected be able to do. 
For the record, here's what I see after grepping dependency:tree

{noformat}
[INFO] +- org.apache.tika:tika-core:jar:1.6-SNAPSHOT:compile
[INFO] +- org.gagravarr:vorbis-java-tika:jar:0.6:compile
[INFO] +- edu.ucar:netcdf:jar:4.2.20:compile
[INFO] |  +- edu.ucar:unidataCommon:jar:4.2.20:compile
[INFO] |  |  \- net.jcip:jcip-annotations:jar:1.0:compile
[INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] |  \- org.slf4j:slf4j-api:jar:1.6.1:compile
[INFO] +- net.sourceforge.jmatio:jmatio:jar:1.0:compile
[INFO] +- org.apache.james:apache-mime4j-core:jar:0.7.2:compile
[INFO] +- org.apache.james:apache-mime4j-dom:jar:0.7.2:compile
[INFO] +- org.apache.commons:commons-compress:jar:1.8:compile
[INFO] |  \- org.tukaani:xz:jar:1.5:compile
[INFO] +- commons-codec:commons-codec:jar:1.5:compile
[INFO] +- org.apache.pdfbox:pdfbox:jar:1.8.6:compile
[INFO] |  +- org.apache.pdfbox:fontbox:jar:1.8.6:compile
[INFO] |  +- org.apache.pdfbox:jempbox:jar:1.8.6:compile
[INFO] |  \- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] +- org.bouncycastle:bcmail-jdk15:jar:1.45:compile
[INFO] +- org.bouncycastle:bcprov-jdk15:jar:1.45:compile
[INFO] +- org.apache.poi:poi:jar:3.10-FINAL:compile
[INFO] +- org.apache.poi:poi-scratchpad:jar:3.10-FINAL:compile
[INFO] +- org.apache.poi:poi-ooxml:jar:3.10-FINAL:compile
[INFO] |  +- org.apache.poi:poi-ooxml-schemas:jar:3.10-FINAL:compile
[INFO] |  |  \- org.apache.xmlbeans:xmlbeans:jar:2.3.0:compile
[INFO] |  \- dom4j:dom4j:jar:1.6.1:compile
[INFO] +- org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
[INFO] +- org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1:compile
[INFO] +- org.ow2.asm:asm-debug-all:jar:4.1:compile
[INFO] +- com.googlecode.mp4parser:isoparser:jar:1.0-RC-1:compile
[INFO] |  \- org.aspectj:aspectjrt:jar:1.6.11:compile
[INFO] +- com.drewnoakes:metadata-extractor:jar:2.6.2:compile
[INFO] |  +- com.adobe.xmp:xmpcore:jar:5.1.2:compile
[INFO] |  \- xerces:xercesImpl:jar:2.8.1:compile
[INFO] | \- xml-apis:xml-apis:jar:1.3.03:compile
[INFO] +- de.l3s.boilerpipe:boilerpipe:jar:1.1.0:compile
[INFO] +- rome:rome:jar:1.0:compile
[INFO] |  \- jdom:jdom:jar:1.0:compile
[INFO] +- org.gagravarr:vorbis-java-core:jar:0.6:compile
[INFO] +- com.googlecode.juniversalchardet:juniversalchardet:jar:1.0.3:compile
[INFO] +- com.uwyn:jhighlight:jar:1.0:compile
[INFO] +- com.pff:java-libpst:jar:0.8.1:compile

{noformat}

It's a difficult task to start excluding. I've no idea as a user what many of 
those dependencies are for, and if some of them would be needed by all Parser 
implementations or not. It's easy enough to spot what PDF Parser will need 
(pdfbox), but more tricky to see what else might be needed for PDF as well as 
for other types.

 Tika documentation should list tika-parsers parser dependencies
 ---

 Key: TIKA-1367
 URL: https://issues.apache.org/jira/browse/TIKA-1367
 Project: Tika
  Issue Type: Improvement
  Components: documentation
Reporter: Sergey Beryozkin
 Fix For: 1.6


 tika-parsers module has many strong transitive parser dependencies. Maven 
 users of tika-parsers have to exclude all the transitivie dependencies 
 manually. Documenting the list of the existing transitive dependencies and 
 keeping the list up to date will help developers exclude the libraries not 
 needed for a given project.



--
This message was sent by Atlassian JIRA
(v6.2#6252)