[jira] [Commented] (TIKA-1514) http-equiv content-type extraction should pick first parseable content value

2015-01-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277503#comment-14277503 ] Tim Allison commented on TIKA-1514: --- I dug into this a bit. It will take more effort

Re: [VOTE] Apache Tika 1.7 Release

2015-01-14 Thread David Meikle
Hi Tyler, On 9 Jan 2015, at 22:02, Tyler Palsulich tpalsul...@apache.org wrote: A candidate for the Tika 1.7 release is available at: https://dist.apache.org/repos/dist/dev/tika/ https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in:

[jira] [Created] (TIKA-1517) MIME type selection with probability

2015-01-14 Thread Shuai Liu (JIRA)
Shuai Liu created TIKA-1517: --- Summary: MIME type selection with probability Key: TIKA-1517 URL: https://issues.apache.org/jira/browse/TIKA-1517 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-1517) MIME type selection with probability

2015-01-14 Thread Shuai Liu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Liu updated TIKA-1517: Description: Problem and intuition The original implementation in MIME type determination is a bit less

[jira] [Updated] (TIKA-1517) MIME type selection with probability

2015-01-14 Thread Shuai Liu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Liu updated TIKA-1517: Description: Problem and intuition The original implementation in MIME type determination is a bit less

[jira] [Issue Comment Deleted] (TIKA-1517) MIME type selection with probability

2015-01-14 Thread Shuai Liu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Liu updated TIKA-1517: Comment: was deleted (was: Proposed design: The idea of selection is to incorporate probability as weights

[jira] [Commented] (TIKA-1517) MIME type selection with probability

2015-01-14 Thread Shuai Liu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278190#comment-14278190 ] Shuai Liu commented on TIKA-1517: - Proposed design: The idea of selection is to incorporate

[jira] [Updated] (TIKA-1517) MIME type selection with probability

2015-01-14 Thread Shuai Liu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Liu updated TIKA-1517: Description: Problem and intuition The original implementation in MIME type determination is a bit less

[jira] [Commented] (TIKA-241) Rar archive support

2015-01-14 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276966#comment-14276966 ] Konstantin Gribov commented on TIKA-241: Don't we need to add licensing info about

Re: [VOTE] Apache Tika 1.7 Release

2015-01-14 Thread Hong-Thai Nguyen
I've checked again some regression tests. Seem fine for me too. So +1 Great job Tyler ! On Fri, Jan 9, 2015 at 11:02 PM, Tyler Palsulich tpalsul...@apache.org wrote: Hi All, A candidate for the Tika 1.7 release is available at: https://dist.apache.org/repos/dist/dev/tika/ The release

[jira] [Commented] (TIKA-1509) Create configurable strategies for composite parsers

2015-01-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276936#comment-14276936 ] Nick Burch commented on TIKA-1509: -- Passing a strategy to CompositeParser, then having

[jira] [Commented] (TIKA-1511) Create a parser for SQLite3

2015-01-14 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277044#comment-14277044 ] Luis Filipe Nassif commented on TIKA-1511: -- 1) I vote to handle each table as a

[jira] [Commented] (TIKA-1511) Create a parser for SQLite3

2015-01-14 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277080#comment-14277080 ] Konstantin Gribov commented on TIKA-1511: - [~talli...@mitre.org], working with

[jira] [Updated] (TIKA-1517) MIME type selection with probability

2015-01-14 Thread Shuai Liu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Liu updated TIKA-1517: Description: Problem and intuition The original implementation in MIME type determination is a bit less

[jira] [Commented] (TIKA-241) Rar archive support

2015-01-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277091#comment-14277091 ] Nick Burch commented on TIKA-241: - In r1651709, I've added the unrar license to what I

Re: [VOTE] Apache Tika 1.7 Release

2015-01-14 Thread Tyler Palsulich
Thanks everyone! I'll close off this VOTE and roll the release tomorrow morning. Nick, thanks for building the site! We still need to rebuild the index, right? Tyler On Wed, Jan 14, 2015 at 8:37 AM, Allison, Timothy B. talli...@mitre.org wrote: +1 Built successfully on both Windows 7 and

Re: [VOTE] Apache Tika 1.7 Release

2015-01-14 Thread Mattmann, Chris A (3980)
+1 to release GPG sigs and Checksums good (after import of tika.asc) Great work Tyler and team! Cheers, Chris [chipotle:~/tmp/apache-tika-1.7-rc3] mattmann% $HOME/bin/stage_apache_rc tika 1.7-src https://dist.apache.org/repos/dist/dev/tika/ % Total% Received % Xferd Average Speed Time

RE: [VOTE] Apache Tika 1.7 Release

2015-01-14 Thread Allison, Timothy B.
+1 Built successfully on both Windows 7 and RHEL 6.5 for me...no Tesseract installed. Relying on post rc2 release eval for TIKA 1445 against trunk for no new regressions. Manually confirmed image metadata is being extracted. Thank you, Tyler! Best, Tim

Re: [VOTE] Apache Tika 1.7 Release

2015-01-14 Thread Thomas Ledoux
+1, works for me 2015-01-13 9:23 GMT+01:00 Tyler Palsulich tpalsul...@gmail.com: Hi Folks, Let's mark this RC#2 as failed and shift the vote to the updated RC#3 ( http://markmail.org/message/m5gpgmr7hedgpjdj), which has Tesseract metadata fixes and David's test fix. Thanks, Tyler On

Tika 2.0 discussion

2015-01-14 Thread Allison, Timothy B.
All, I just started a wiki page for our discussion of Tika 2.0 (https://wiki.apache.org/tika/Tika2_0RoadMap). Please modify/edit/discuss as you see fit. On a related note, I also started a wiki page for our CompositeParser strategy discussion

Tika wiki access

2015-01-14 Thread Konstantin Gribov
Hello. Can you give me write access to Tika wiki, please? My account there is KonstantinGribov (email is same, gros...@gmail.com). -- Best regards, Konstantin Gribov

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2015-01-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276916#comment-14276916 ] Tim Allison commented on TIKA-1513: --- From a brochure-level evaluation :), I'd prefer

[jira] [Commented] (TIKA-241) Rar archive support

2015-01-14 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276818#comment-14276818 ] Hudson commented on TIKA-241: - SUCCESS: Integrated in tika-trunk-jdk1.7 #428 (See

[jira] [Commented] (TIKA-241) Rar archive support

2015-01-14 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276831#comment-14276831 ] Hudson commented on TIKA-241: - SUCCESS: Integrated in tika-trunk-jdk1.6 #413 (See

[jira] [Commented] (TIKA-1509) Create configurable strategies for composite parsers

2015-01-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276833#comment-14276833 ] Tim Allison commented on TIKA-1509: --- Y, I agree on compatibility. How about we add a

[jira] [Commented] (TIKA-241) Rar archive support

2015-01-14 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276866#comment-14276866 ] Hudson commented on TIKA-241: - SUCCESS: Integrated in tika-trunk-jdk1.7 #429 (See

[jira] [Commented] (TIKA-241) Rar archive support

2015-01-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276849#comment-14276849 ] Nick Burch commented on TIKA-241: - Thanks, applied with a few tweaks (mostly for the OSGi

[jira] [Resolved] (TIKA-241) Rar archive support

2015-01-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-241. - Resolution: Fixed Fix Version/s: 1.8 Rar archive support ---

[jira] [Updated] (TIKA-1516) Downgrade Rome dependency to 0.9 to avoid nasty NPE

2015-01-14 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-1516: --- Attachment: TIKA-1516.patch Small change. I apologies for not including test case. I

[jira] [Updated] (TIKA-1515) Old XLS 3 parsing is not working on some documents

2015-01-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1515: -- Description: Thanks to [~gagravarr], we now have mime type id for excel.sheet.4 and excel.sheet.3, and