[jira] [Updated] (TIKA-1744) Use java.nio.file.Path in TikaInputStream

2015-09-30 Thread Yaniv Kunda (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1744: -- Attachment: TIKA-1744-2.patch Additional minor patch: - Corrected javadoc links - Added {{@Deprecated}}

[jira] [Updated] (TIKA-1755) Make ppt and pptx paragraph/div breaks more consistent

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1755: -- Attachment: TIKA-1755.patch Initial patch > Make ppt and pptx paragraph/div breaks more consistent >

[jira] [Commented] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-09-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938167#comment-14938167 ] Hudson commented on TIKA-1707: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #860 (See

[jira] [Commented] (TIKA-1742) StackOverflowError parsing a PDF with ExtractInlineImages=true

2015-09-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938166#comment-14938166 ] Hudson commented on TIKA-1742: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #860 (See

[jira] [Created] (TIKA-1755) Make ppt and pptx paragraph/div breaks more consistent

2015-09-30 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1755: - Summary: Make ppt and pptx paragraph/div breaks more consistent Key: TIKA-1755 URL: https://issues.apache.org/jira/browse/TIKA-1755 Project: Tika Issue Type:

[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938893#comment-14938893 ] Tim Allison commented on TIKA-1757: --- Sorry about that. Will fix shortly. Thank you! > tika-batch tests

[jira] [Assigned] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1757: - Assignee: Tim Allison > tika-batch tests fail on systems with whitespace or special chars in

[jira] [Updated] (TIKA-1758) BatchCommandLineBuilder fails on systems with whitespace in path

2015-09-30 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-1758: Description: All tests for CLI module fail with errors like that: {noformat} Tests run: 6,

[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938915#comment-14938915 ] Uwe Schindler commented on TIKA-1757: - The other issue is different, I opened TIKA-1758 > tika-batch

Re: extracting contributor information?

2015-09-30 Thread Ray Gauss
For edits I'd say +1. For annotations and comments I'm undecided, the DC definition is somewhat vague: "An entity responsible for making contributions to the resource." If a user's comment is "this document is terrible" is he/she a contributor? Regards, Ray > On Sep 30, 2015, at 4:31 PM,

RE: extracting contributor information?

2015-09-30 Thread Allison, Timothy B.
Y, I agree. What I want it to mean is "anyone who touched the document"...not necessarily those "responsible" for the resource: "Contributor is the most general of the elements used for "agents" responsible for the resource" Given that contributor is the most general, that seems the best fit

[jira] [Created] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Uwe Schindler (JIRA)
Uwe Schindler created TIKA-1757: --- Summary: tika-batch tests fail on systems with whitespace or special chars in folder name Key: TIKA-1757 URL: https://issues.apache.org/jira/browse/TIKA-1757 Project:

[jira] [Created] (TIKA-1756) Update forbiddenapis to v2.0

2015-09-30 Thread Uwe Schindler (JIRA)
Uwe Schindler created TIKA-1756: --- Summary: Update forbiddenapis to v2.0 Key: TIKA-1756 URL: https://issues.apache.org/jira/browse/TIKA-1756 Project: Tika Issue Type: Improvement

RE: [jira] [Resolved] (TIKA-1747) Change file->path in tika-batch throughout

2015-09-30 Thread Yaniv Kunda
Tim - I actually had a shelved changelist with improvements almost identical to what you did for FSBatchTestBase! I also shared the thought that the utility methods - countChildren, readFileToString, deleteDirectory, listPaths - should be elsewhere. Ideally in commons-io, but this will have to

[jira] [Created] (TIKA-1758) BatchCommandLineBuilder fails on systems with whitespace in path

2015-09-30 Thread Uwe Schindler (JIRA)
Uwe Schindler created TIKA-1758: --- Summary: BatchCommandLineBuilder fails on systems with whitespace in path Key: TIKA-1758 URL: https://issues.apache.org/jira/browse/TIKA-1758 Project: Tika

[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Yaniv Kunda (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938914#comment-14938914 ] Yaniv Kunda commented on TIKA-1757: --- Also, regarding the badness of {{URL#getFile()}} - on Windows

[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938906#comment-14938906 ] Uwe Schindler commented on TIKA-1757: - Please wait with committing there are more tests failing with

[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Yaniv Kunda (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938908#comment-14938908 ] Yaniv Kunda commented on TIKA-1757: --- If one needs a java.nio.file.Path, {{Paths.get(url.toURI())}} can be

[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938913#comment-14938913 ] Tim Allison commented on TIKA-1757: --- Y, won't be able to fix for a few hours, but I can replicate the

extracting contributor information?

2015-09-30 Thread Allison, Timothy B.
All, It might be useful to extract contributor information (names of people who made annotations/edits/comments) from at least MSOffice and PDF documents into our Metadata object. Any interest in this type of extraction? Any objections to using dc:contributor for this? Best,

[jira] [Commented] (TIKA-1755) Make ppt and pptx paragraph/div breaks more consistent

2015-09-30 Thread Andreas Beeker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938860#comment-14938860 ] Andreas Beeker commented on TIKA-1755: -- I think, the goal would be, to modify common sl in such a way,

[jira] [Updated] (TIKA-1756) Update forbiddenapis to v2.0

2015-09-30 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-1756: Attachment: TIKA-1756.patch > Update forbiddenapis to v2.0 > > >

[jira] [Commented] (TIKA-1756) Update forbiddenapis to v2.0

2015-09-30 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938879#comment-14938879 ] Uwe Schindler commented on TIKA-1756: - While testing this I found out that TIKA's test break when

[jira] [Updated] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated TIKA-1757: Attachment: TIKA-1757.patch Patch for broken test. > tika-batch tests fail on systems with

[jira] [Commented] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936698#comment-14936698 ] Tim Allison commented on TIKA-1748: --- Thank you! Will commit today. > Upgrade to POI 3.13-final when

[jira] [Assigned] (TIKA-1752) Use java.nio.file.Path in org.apache.tika.detect

2015-09-30 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Gribov reassigned TIKA-1752: --- Assignee: Konstantin Gribov > Use java.nio.file.Path in org.apache.tika.detect >

[jira] [Comment Edited] (TIKA-1758) BatchCommandLineBuilder fails on systems with whitespace in path

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939102#comment-14939102 ] Tim Allison edited comment on TIKA-1758 at 10/1/15 12:27 AM: - r1706178. Thank

[jira] [Resolved] (TIKA-1758) BatchCommandLineBuilder fails on systems with whitespace in path

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1758. --- Resolution: Fixed r1706178. Thank you, [~kunda]. > BatchCommandLineBuilder fails on systems with

[jira] [Updated] (TIKA-1751) Use java.nio.file.Path in TikaConfig

2015-09-30 Thread Yaniv Kunda (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1751: -- Attachment: TIKA-1751.patch Updated patch to latest changes. > Use java.nio.file.Path in TikaConfig >

[jira] [Updated] (TIKA-1751) Use java.nio.file.Path in TikaConfig

2015-09-30 Thread Yaniv Kunda (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1751: -- Attachment: (was: TIKA-1751.patch) > Use java.nio.file.Path in TikaConfig >

[jira] [Resolved] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1757. --- Resolution: Fixed Mea culpa. Tests pass for me on Windows with space in path and Linux. Let me know

[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938917#comment-14938917 ] Uwe Schindler commented on TIKA-1757: - bq. If one needs a java.nio.file.Path, Paths.get(url.toURI())

[jira] [Commented] (TIKA-1758) BatchCommandLineBuilder fails on systems with whitespace in path

2015-09-30 Thread Yaniv Kunda (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938972#comment-14938972 ] Yaniv Kunda commented on TIKA-1758: --- Not a hard requirement - can be avoided by converting a Path back to

[jira] [Updated] (TIKA-1758) BatchCommandLineBuilder fails on systems with whitespace in path

2015-09-30 Thread Yaniv Kunda (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1758: -- Attachment: TIKA-1758.patch A patch containing a fix (and more File->Path migration), requires

[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939120#comment-14939120 ] Hudson commented on TIKA-1757: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #861 (See

[jira] [Assigned] (TIKA-1747) Change file->path in tika-batch throughout

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1747: - Assignee: Tim Allison > Change file->path in tika-batch throughout >

[jira] [Resolved] (TIKA-1747) Change file->path in tika-batch throughout

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1747. --- Resolution: Fixed r1706060 > Change file->path in tika-batch throughout >

[jira] [Resolved] (TIKA-1754) tika-batch's FileListCrawler truncates the first character of the fileList if the root is e.g. X:

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1754. --- Resolution: Fixed Fixed with TIKA-1747. > tika-batch's FileListCrawler truncates the first character

[jira] [Commented] (TIKA-1754) tika-batch's FileListCrawler truncates the first character of the fileList if the root is e.g. X:

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937440#comment-14937440 ] Tim Allison commented on TIKA-1754: --- Y, probably. This particular issue is fixed for now. Will take a

[jira] [Commented] (TIKA-1744) Use java.nio.file.Path in TikaInputStream

2015-09-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937030#comment-14937030 ] Hudson commented on TIKA-1744: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #857 (See

[jira] [Assigned] (TIKA-1744) Use java.nio.file.Path in TikaInputStream

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1744: - Assignee: Tim Allison > Use java.nio.file.Path in TikaInputStream >

[jira] [Created] (TIKA-1754) tika-batch's FileListCrawler truncates the first character of the fileList if the root is e.g. X:

2015-09-30 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1754: - Summary: tika-batch's FileListCrawler truncates the first character of the fileList if the root is e.g. X: Key: TIKA-1754 URL: https://issues.apache.org/jira/browse/TIKA-1754

[jira] [Commented] (TIKA-1747) Change file->path in tika-batch throughout

2015-09-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937701#comment-14937701 ] Hudson commented on TIKA-1747: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #858 (See

[jira] [Resolved] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1707. --- Resolution: Fixed r1706079. Thank you, [~kiwiwings]! Apologies to you and [~gagravarr] for not

[jira] [Resolved] (TIKA-1742) StackOverflowError parsing a PDF with ExtractInlineImages=true

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1742. --- Resolution: Fixed r1706086 > StackOverflowError parsing a PDF with ExtractInlineImages=true >

[jira] [Commented] (TIKA-1742) StackOverflowError parsing a PDF with ExtractInlineImages=true

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937720#comment-14937720 ] Tim Allison commented on TIKA-1742: --- Thank you, [~nated], for raising this, and thank you [~tilman] for a

[jira] [Resolved] (TIKA-1752) Use java.nio.file.Path in org.apache.tika.detect

2015-09-30 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Gribov resolved TIKA-1752. - Resolution: Fixed > Use java.nio.file.Path in org.apache.tika.detect >