Re: [VOTE] Release Apache Tika 1.6 RC #2

2014-09-03 Thread Lewis John Mcgibbney
Hi Folks,
OK I started this two days ago... here I finish up.

On Mon, Sep 1, 2014 at 9:39 AM, dev-digest-h...@tika.apache.org wrote:


 A candidate for the Tika 1.6 release is available at:

 http://people.apache.org/~mattmann/apache-tika-1.6/rc2/



So I check out all artifacts and all are fine except for
-rw-r--r--1 lmcgibbn  admin  36133725 Aug 31 21:55 tika-server-1.6.jar
-rw-r--r--1 lmcgibbn  admin   243 Aug 31 21:55
tika-server-1.6.jar.asc
-rw-r--r--1 lmcgibbn  admin68 Aug 31 22:14
tika-server-1.6.jar.sha1
which does not have an md5... but I feel that this is not a blocker as I
verify everything else and it is A OK.



 The release candidate is a zip archive of the sources in:

 http://svn.apache.org/repos/asf/tika/tags/1.6-rc2/


mvn clean install is fine on the tag locally



 https://repository.apache.org/content/repositories/orgapachetika-1004/


I used Crawler Commons in a test project for sitemap parsing, staging
artifact for Tika looks great.





 Please vote on releasing this package as Apache Tika 1.6.
 The vote is open for the next 72 hours and passes if a majority of at
 least three +1 Tika PMC votes are cast.





 [X ] +1 Release this package as Apache Tika 1.6

Thanks for persistence Chris amongst others for release.
Lewis


tika-trunk-jdk1.7 - Build # 192 - Failure

2014-09-03 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-trunk-jdk1.7 (build #192)

Status: Failure

Check console output at https://builds.apache.org/job/tika-trunk-jdk1.7/192/ to 
view the results.

[jira] [Commented] (TIKA-1330) Add robust tika-batch code

2014-09-03 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119745#comment-14119745
 ] 

Tim Allison commented on TIKA-1330:
---

Looks like ballpark estimate on time for processing on TIKA-1302 was about 
right.  I just finished a complete run of govdocs1 (~1 million files) on an 8 
cpu vm with 8 gb available, -Xmx4g.  The run used 15 consumers and completed in 
about 4 hours.  The driver restarted the process thirteen times (6 permanent 
hangs and 7 OOM).

 Add robust tika-batch code
 --

 Key: TIKA-1330
 URL: https://issues.apache.org/jira/browse/TIKA-1330
 Project: Tika
  Issue Type: Sub-task
  Components: cli, general, server
Reporter: Tim Allison
Assignee: Tim Allison

 In my current design plan, I see creating a separate component tika-batch 
 that includes a small bit of configurable code to run Tika against a large 
 batch of documents.  This code should be robust against OOM and hangs, and it 
 should have fairly robust logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1409) Error asking for a directory mime-type

2014-09-03 Thread Piero Ottuzzi (JIRA)
Piero Ottuzzi created TIKA-1409:
---

 Summary: Error asking for a directory mime-type
 Key: TIKA-1409
 URL: https://issues.apache.org/jira/browse/TIKA-1409
 Project: Tika
  Issue Type: Bug
  Components: general
Affects Versions: 1.5
 Environment: Windows 7 and JDK 1.8
Reporter: Piero Ottuzzi


Hi there,

   just for curiosity I used the code you can find at the end of the Content 
and language detection page[1] to get the Tika mimetype for a directory.
I tried on a well known directory (System.getProperty(user.home)) and I got:
java.io.FileNotFoundException: C:\Users\2913 (Access is denied)
at java.io.FileInputStream.open(Native Method) ~[na:1.8.0_11]
at java.io.FileInputStream.init(FileInputStream.java:131) 
~[na:1.8.0_11]
at org.apache.tika.io.TikaInputStream.init(TikaInputStream.java:444) 
~[tika-core-1.5.jar:na]
at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:231) 
~[tika-core-1.5.jar:na]
at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:212) 
~[tika-core-1.5.jar:na]

Obviously the directory exists and it is readable.
Is this the expected behaviour?

Thanks
Bye
Piero

[1]http://tika.apache.org/1.5/detection.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1409) Error asking for a directory mime-type

2014-09-03 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119912#comment-14119912
 ] 

Nick Burch commented on TIKA-1409:
--

Directories don't have mime types, only content does

It looks like your JVM is giving a somewhat confusing error message if you try 
to open a directory as if it were a file, but overall asking for a mime type of 
a directory will never work so I'm not sure to what extent we want to add a 
special error message?

 Error asking for a directory mime-type
 --

 Key: TIKA-1409
 URL: https://issues.apache.org/jira/browse/TIKA-1409
 Project: Tika
  Issue Type: Bug
  Components: general
Affects Versions: 1.5
 Environment: Windows 7 and JDK 1.8
Reporter: Piero Ottuzzi

 Hi there,
just for curiosity I used the code you can find at the end of the Content 
 and language detection page[1] to get the Tika mimetype for a directory.
 I tried on a well known directory (System.getProperty(user.home)) and I got:
 java.io.FileNotFoundException: C:\Users\2913 (Access is denied)
   at java.io.FileInputStream.open(Native Method) ~[na:1.8.0_11]
   at java.io.FileInputStream.init(FileInputStream.java:131) 
 ~[na:1.8.0_11]
   at org.apache.tika.io.TikaInputStream.init(TikaInputStream.java:444) 
 ~[tika-core-1.5.jar:na]
   at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:231) 
 ~[tika-core-1.5.jar:na]
   at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:212) 
 ~[tika-core-1.5.jar:na]
 Obviously the directory exists and it is readable.
 Is this the expected behaviour?
 Thanks
 Bye
 Piero
 [1]http://tika.apache.org/1.5/detection.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1409) Error asking for a directory mime-type

2014-09-03 Thread Piero Ottuzzi (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119949#comment-14119949
 ] 

Piero Ottuzzi commented on TIKA-1409:
-

Hi,

   I can agree with you that this is almost a non-sense but a search on google 
reports that on many linux distros the mime type for directory is 
inode/directory but I cannot find it in tika-mimetypes.xml[1].
So the test was done only to understand what apache tika was going to print and 
I was a bit surprised by the unexpected result.
Do you think it is worth to add inode/directory to Tika as a mime-type for 
directories?
It is a simple, yet not fully RFC compliant, way to fix this corner case.

Thanks 
Bye
Piero

[1]http://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

 Error asking for a directory mime-type
 --

 Key: TIKA-1409
 URL: https://issues.apache.org/jira/browse/TIKA-1409
 Project: Tika
  Issue Type: Bug
  Components: general
Affects Versions: 1.5
 Environment: Windows 7 and JDK 1.8
Reporter: Piero Ottuzzi

 Hi there,
just for curiosity I used the code you can find at the end of the Content 
 and language detection page[1] to get the Tika mimetype for a directory.
 I tried on a well known directory (System.getProperty(user.home)) and I got:
 java.io.FileNotFoundException: C:\Users\2913 (Access is denied)
   at java.io.FileInputStream.open(Native Method) ~[na:1.8.0_11]
   at java.io.FileInputStream.init(FileInputStream.java:131) 
 ~[na:1.8.0_11]
   at org.apache.tika.io.TikaInputStream.init(TikaInputStream.java:444) 
 ~[tika-core-1.5.jar:na]
   at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:231) 
 ~[tika-core-1.5.jar:na]
   at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:212) 
 ~[tika-core-1.5.jar:na]
 Obviously the directory exists and it is readable.
 Is this the expected behaviour?
 Thanks
 Bye
 Piero
 [1]http://tika.apache.org/1.5/detection.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1409) Error asking for a directory mime-type

2014-09-03 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119983#comment-14119983
 ] 

Nick Burch commented on TIKA-1409:
--

I believe that inodes are a unix-specific thing, so that mime type is perhaps 
not a totally generic one for a directory

 Error asking for a directory mime-type
 --

 Key: TIKA-1409
 URL: https://issues.apache.org/jira/browse/TIKA-1409
 Project: Tika
  Issue Type: Bug
  Components: general
Affects Versions: 1.5
 Environment: Windows 7 and JDK 1.8
Reporter: Piero Ottuzzi

 Hi there,
just for curiosity I used the code you can find at the end of the Content 
 and language detection page[1] to get the Tika mimetype for a directory.
 I tried on a well known directory (System.getProperty(user.home)) and I got:
 java.io.FileNotFoundException: C:\Users\2913 (Access is denied)
   at java.io.FileInputStream.open(Native Method) ~[na:1.8.0_11]
   at java.io.FileInputStream.init(FileInputStream.java:131) 
 ~[na:1.8.0_11]
   at org.apache.tika.io.TikaInputStream.init(TikaInputStream.java:444) 
 ~[tika-core-1.5.jar:na]
   at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:231) 
 ~[tika-core-1.5.jar:na]
   at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:212) 
 ~[tika-core-1.5.jar:na]
 Obviously the directory exists and it is readable.
 Is this the expected behaviour?
 Thanks
 Bye
 Piero
 [1]http://tika.apache.org/1.5/detection.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Release Apache Tika 1.6 RC #2

2014-09-03 Thread David Meikle
Hi Chris,

On 1 Sep 2014, at 06:16, Mattmann, Chris A (3980) 
chris.a.mattm...@jpl.nasa.gov wrote:

[ ] +1 Release this package as Apache Tika 1.6

+1 from me, working fine in a couple of projects I use it in.  Thanks for 
sticking with this one Chris!

Cheers,
Dave

Re: [VOTE] Release Apache Tika 1.6 RC #2

2014-09-03 Thread Sergey Beryozkin

Hi

+1

Thanks, Sergey

On 1 Sep 2014, at 06:16, Mattmann, Chris A (3980)
chris.a.mattm...@jpl.nasa.gov mailto:chris.a.mattm...@jpl.nasa.gov
wrote:


   [ ] +1 Release this package as Apache Tika 1.6