Re: [VOTE] Release Apache Tika 1.6 RC #2
Hi Folks, OK I started this two days ago... here I finish up. On Mon, Sep 1, 2014 at 9:39 AM, dev-digest-h...@tika.apache.org wrote: A candidate for the Tika 1.6 release is available at: http://people.apache.org/~mattmann/apache-tika-1.6/rc2/ So I check out all artifacts and all are fine except for -rw-r--r--1 lmcgibbn admin 36133725 Aug 31 21:55 tika-server-1.6.jar -rw-r--r--1 lmcgibbn admin 243 Aug 31 21:55 tika-server-1.6.jar.asc -rw-r--r--1 lmcgibbn admin68 Aug 31 22:14 tika-server-1.6.jar.sha1 which does not have an md5... but I feel that this is not a blocker as I verify everything else and it is A OK. The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.6-rc2/ mvn clean install is fine on the tag locally https://repository.apache.org/content/repositories/orgapachetika-1004/ I used Crawler Commons in a test project for sitemap parsing, staging artifact for Tika looks great. Please vote on releasing this package as Apache Tika 1.6. The vote is open for the next 72 hours and passes if a majority of at least three +1 Tika PMC votes are cast. [X ] +1 Release this package as Apache Tika 1.6 Thanks for persistence Chris amongst others for release. Lewis
tika-trunk-jdk1.7 - Build # 192 - Failure
The Apache Jenkins build system has built tika-trunk-jdk1.7 (build #192) Status: Failure Check console output at https://builds.apache.org/job/tika-trunk-jdk1.7/192/ to view the results.
[jira] [Commented] (TIKA-1330) Add robust tika-batch code
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119745#comment-14119745 ] Tim Allison commented on TIKA-1330: --- Looks like ballpark estimate on time for processing on TIKA-1302 was about right. I just finished a complete run of govdocs1 (~1 million files) on an 8 cpu vm with 8 gb available, -Xmx4g. The run used 15 consumers and completed in about 4 hours. The driver restarted the process thirteen times (6 permanent hangs and 7 OOM). Add robust tika-batch code -- Key: TIKA-1330 URL: https://issues.apache.org/jira/browse/TIKA-1330 Project: Tika Issue Type: Sub-task Components: cli, general, server Reporter: Tim Allison Assignee: Tim Allison In my current design plan, I see creating a separate component tika-batch that includes a small bit of configurable code to run Tika against a large batch of documents. This code should be robust against OOM and hangs, and it should have fairly robust logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1409) Error asking for a directory mime-type
Piero Ottuzzi created TIKA-1409: --- Summary: Error asking for a directory mime-type Key: TIKA-1409 URL: https://issues.apache.org/jira/browse/TIKA-1409 Project: Tika Issue Type: Bug Components: general Affects Versions: 1.5 Environment: Windows 7 and JDK 1.8 Reporter: Piero Ottuzzi Hi there, just for curiosity I used the code you can find at the end of the Content and language detection page[1] to get the Tika mimetype for a directory. I tried on a well known directory (System.getProperty(user.home)) and I got: java.io.FileNotFoundException: C:\Users\2913 (Access is denied) at java.io.FileInputStream.open(Native Method) ~[na:1.8.0_11] at java.io.FileInputStream.init(FileInputStream.java:131) ~[na:1.8.0_11] at org.apache.tika.io.TikaInputStream.init(TikaInputStream.java:444) ~[tika-core-1.5.jar:na] at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:231) ~[tika-core-1.5.jar:na] at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:212) ~[tika-core-1.5.jar:na] Obviously the directory exists and it is readable. Is this the expected behaviour? Thanks Bye Piero [1]http://tika.apache.org/1.5/detection.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1409) Error asking for a directory mime-type
[ https://issues.apache.org/jira/browse/TIKA-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119912#comment-14119912 ] Nick Burch commented on TIKA-1409: -- Directories don't have mime types, only content does It looks like your JVM is giving a somewhat confusing error message if you try to open a directory as if it were a file, but overall asking for a mime type of a directory will never work so I'm not sure to what extent we want to add a special error message? Error asking for a directory mime-type -- Key: TIKA-1409 URL: https://issues.apache.org/jira/browse/TIKA-1409 Project: Tika Issue Type: Bug Components: general Affects Versions: 1.5 Environment: Windows 7 and JDK 1.8 Reporter: Piero Ottuzzi Hi there, just for curiosity I used the code you can find at the end of the Content and language detection page[1] to get the Tika mimetype for a directory. I tried on a well known directory (System.getProperty(user.home)) and I got: java.io.FileNotFoundException: C:\Users\2913 (Access is denied) at java.io.FileInputStream.open(Native Method) ~[na:1.8.0_11] at java.io.FileInputStream.init(FileInputStream.java:131) ~[na:1.8.0_11] at org.apache.tika.io.TikaInputStream.init(TikaInputStream.java:444) ~[tika-core-1.5.jar:na] at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:231) ~[tika-core-1.5.jar:na] at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:212) ~[tika-core-1.5.jar:na] Obviously the directory exists and it is readable. Is this the expected behaviour? Thanks Bye Piero [1]http://tika.apache.org/1.5/detection.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1409) Error asking for a directory mime-type
[ https://issues.apache.org/jira/browse/TIKA-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119949#comment-14119949 ] Piero Ottuzzi commented on TIKA-1409: - Hi, I can agree with you that this is almost a non-sense but a search on google reports that on many linux distros the mime type for directory is inode/directory but I cannot find it in tika-mimetypes.xml[1]. So the test was done only to understand what apache tika was going to print and I was a bit surprised by the unexpected result. Do you think it is worth to add inode/directory to Tika as a mime-type for directories? It is a simple, yet not fully RFC compliant, way to fix this corner case. Thanks Bye Piero [1]http://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml Error asking for a directory mime-type -- Key: TIKA-1409 URL: https://issues.apache.org/jira/browse/TIKA-1409 Project: Tika Issue Type: Bug Components: general Affects Versions: 1.5 Environment: Windows 7 and JDK 1.8 Reporter: Piero Ottuzzi Hi there, just for curiosity I used the code you can find at the end of the Content and language detection page[1] to get the Tika mimetype for a directory. I tried on a well known directory (System.getProperty(user.home)) and I got: java.io.FileNotFoundException: C:\Users\2913 (Access is denied) at java.io.FileInputStream.open(Native Method) ~[na:1.8.0_11] at java.io.FileInputStream.init(FileInputStream.java:131) ~[na:1.8.0_11] at org.apache.tika.io.TikaInputStream.init(TikaInputStream.java:444) ~[tika-core-1.5.jar:na] at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:231) ~[tika-core-1.5.jar:na] at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:212) ~[tika-core-1.5.jar:na] Obviously the directory exists and it is readable. Is this the expected behaviour? Thanks Bye Piero [1]http://tika.apache.org/1.5/detection.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1409) Error asking for a directory mime-type
[ https://issues.apache.org/jira/browse/TIKA-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119983#comment-14119983 ] Nick Burch commented on TIKA-1409: -- I believe that inodes are a unix-specific thing, so that mime type is perhaps not a totally generic one for a directory Error asking for a directory mime-type -- Key: TIKA-1409 URL: https://issues.apache.org/jira/browse/TIKA-1409 Project: Tika Issue Type: Bug Components: general Affects Versions: 1.5 Environment: Windows 7 and JDK 1.8 Reporter: Piero Ottuzzi Hi there, just for curiosity I used the code you can find at the end of the Content and language detection page[1] to get the Tika mimetype for a directory. I tried on a well known directory (System.getProperty(user.home)) and I got: java.io.FileNotFoundException: C:\Users\2913 (Access is denied) at java.io.FileInputStream.open(Native Method) ~[na:1.8.0_11] at java.io.FileInputStream.init(FileInputStream.java:131) ~[na:1.8.0_11] at org.apache.tika.io.TikaInputStream.init(TikaInputStream.java:444) ~[tika-core-1.5.jar:na] at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:231) ~[tika-core-1.5.jar:na] at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:212) ~[tika-core-1.5.jar:na] Obviously the directory exists and it is readable. Is this the expected behaviour? Thanks Bye Piero [1]http://tika.apache.org/1.5/detection.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Release Apache Tika 1.6 RC #2
Hi Chris, On 1 Sep 2014, at 06:16, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: [ ] +1 Release this package as Apache Tika 1.6 +1 from me, working fine in a couple of projects I use it in. Thanks for sticking with this one Chris! Cheers, Dave
Re: [VOTE] Release Apache Tika 1.6 RC #2
Hi +1 Thanks, Sergey On 1 Sep 2014, at 06:16, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov mailto:chris.a.mattm...@jpl.nasa.gov wrote: [ ] +1 Release this package as Apache Tika 1.6