[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341529#comment-14341529 ] Hudson commented on TIKA-1558: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #513 (See

[jira] [Commented] (TIKA-1509) Create configurable strategies for composite parsers

2015-02-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341555#comment-14341555 ] Hudson commented on TIKA-1509: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #514 (See

[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341556#comment-14341556 ] Hudson commented on TIKA-1558: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #514 (See

[jira] [Updated] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1561: Fix Version/s: 1.8 GCMD Directory Interchange Format (.dif) identification

[jira] [Commented] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341671#comment-14341671 ] Chris A. Mattmann commented on TIKA-1561: - OK applied the Pull Request, cleaned up

[jira] [Commented] (TIKA-1509) Create configurable strategies for composite parsers

2015-02-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341683#comment-14341683 ] Chris A. Mattmann commented on TIKA-1509: - Fantastic, Tyler, great summary.

[jira] [Commented] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341747#comment-14341747 ] Hudson commented on TIKA-1561: -- ABORTED: Integrated in tika-trunk-jdk1.7 #515 (See

[jira] [Comment Edited] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341674#comment-14341674 ] Chris A. Mattmann edited comment on TIKA-1561 at 2/28/15 5:31 PM:

[jira] [Commented] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-28 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341673#comment-14341673 ] ASF GitHub Bot commented on TIKA-1561: -- Github user asfgit closed the pull request at:

[jira] [Closed] (TIKA-539) Encoding detection is too biased by encoding in meta tag

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-539. Resolution: Fixed Encoding detection is too biased by encoding in meta tag

Re: Curating Issues

2015-02-28 Thread Mattmann, Chris A (3980)
Hey Tyler if you want to take a whack, here are some criteria I tend to use: 1. Bug report from 1+ years old. - Close it - either not reproducible, fixed in a later version and not come back to, or not as bad of a bug anymore since it’s not a blocker. 2. Feature request from 1+ years old that

[jira] [Closed] (TIKA-307) Better handling of partial/truncated input data to parsers

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-307. Resolution: Fixed Zip and other type Parsers are much more robust at this point. Can reopen if still

[jira] [Closed] (TIKA-89) Rename MimeType and MimeTypes

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-89. --- Resolution: Fixed Rename MimeType and MimeTypes - Key:

[jira] [Reopened] (TIKA-289) Add magic byte patterns from file(1)

2015-02-28 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch reopened TIKA-289: - I've just checked, and there are actually a handful of mime types defined in the file magic which we don't

[jira] [Commented] (TIKA-465) LanguageIdentifier API enhancements

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342023#comment-14342023 ] Tyler Palsulich commented on TIKA-465: -- Is there still interest in implementing this?

[jira] [Closed] (TIKA-590) Create facility for deeper introspection of media files

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-590. Resolution: Won't Fix Create facility for deeper introspection of media files

[jira] [Updated] (TIKA-579) DcXMLParser: DC metadata text in extracted body

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-579: - Affects Version/s: (was: 0.8) 1.8 DcXMLParser: DC metadata text in

[jira] [Commented] (TIKA-579) DcXMLParser: DC metadata text in extracted body

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342031#comment-14342031 ] Tyler Palsulich commented on TIKA-579: -- +1. DC tags should be put into the Metadata.

[jira] [Resolved] (TIKA-577) IndexOutOfBounds Exception looking for Picture in Word 03 doc that has no pictures

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-577. -- Resolution: Not a Problem The document is corrupted. The POI error is now {{Caused by:

[jira] [Commented] (TIKA-291) Adobe InDesign support

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341982#comment-14341982 ] Tyler Palsulich commented on TIKA-291: -- We still don't have support for this, but it

[jira] [Updated] (TIKA-90) Allow thumbnails as document metadata

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-90: Priority: Minor (was: Major) Allow thumbnails as document metadata

[jira] [Commented] (TIKA-89) Rename MimeType and MimeTypes

2015-02-28 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341984#comment-14341984 ] Nick Burch commented on TIKA-89: I think this might have already been done? The

[jira] [Closed] (TIKA-354) ProfilingHandler should take a length-limiting parameter

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-354. Resolution: Not a Problem Closing this off, unless you're still interested in getting it in,

[jira] [Commented] (TIKA-375) Improve code quality metrics

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341999#comment-14341999 ] Tyler Palsulich commented on TIKA-375: -- This is a great candidate for any new

[jira] [Updated] (TIKA-375) Improve code quality metrics

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-375: - Labels: newbie (was: ) Improve code quality metrics

[jira] [Commented] (TIKA-381) HtmlParser should strip linefeeds out of links

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342009#comment-14342009 ] Tyler Palsulich commented on TIKA-381: -- This is still an issue in 1.8-SNAPSHOT.

[jira] [Updated] (TIKA-381) HtmlParser should strip linefeeds out of links

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-381: - Affects Version/s: (was: 0.6) 1.8 HtmlParser should strip linefeeds out

[jira] [Closed] (TIKA-272) Expose characters offsets information while parsing text-based inputs.

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-272. Resolution: Won't Fix Expose characters offsets information while parsing text-based inputs.

[jira] [Closed] (TIKA-288) Support override parsers in AutoDetectParser

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-288. Resolution: Duplicate Support override parsers in AutoDetectParser

[jira] [Commented] (TIKA-289) Add magic byte patterns from file(1)

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341980#comment-14341980 ] Tyler Palsulich commented on TIKA-289: -- Does anyone know if Tika integrated the magic

[jira] [Commented] (TIKA-94) Speech recognition

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341978#comment-14341978 ] Tyler Palsulich commented on TIKA-94: - This is similar to machine text translation in

[jira] [Commented] (TIKA-369) Improve accuracy of language detection

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341994#comment-14341994 ] Tyler Palsulich commented on TIKA-369: -- Is there any update on this? Language detection

[jira] [Commented] (TIKA-289) Add magic byte patterns from file(1)

2015-02-28 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341993#comment-14341993 ] Nick Burch commented on TIKA-289: - There are a few issues with integrating it: * Very few

[jira] [Commented] (TIKA-524) Unification of HTML output from Office, OOXML and Open Document parsers

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342026#comment-14342026 ] Tyler Palsulich commented on TIKA-524: -- Is there still interest/a possibility of

[jira] [Closed] (TIKA-497) HtmlHandler should fix up incorrect capitalization of names in meta http-equiv=xxx attributes before putting into metadata

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-497. Resolution: Fixed HtmlHandler should fix up incorrect capitalization of names in meta

[jira] [Closed] (TIKA-289) Add magic byte patterns from file(1)

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-289. Resolution: Won't Fix I agree, [~gagravarr]. Let's consider {{file}} as a reference when we need help

[jira] [Commented] (TIKA-289) Add magic byte patterns from file(1)

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342005#comment-14342005 ] Tyler Palsulich commented on TIKA-289: -- Sounds great! Add magic byte patterns from

[jira] [Commented] (TIKA-591) Separate launcer process for forking JVMs

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342034#comment-14342034 ] Tyler Palsulich commented on TIKA-591: -- Is there still interest in this, or is it

[jira] [Commented] (TIKA-1479) Build a parser to extract data from .iso19139 format

2015-02-28 Thread Gautham Gowrishankar (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341942#comment-14341942 ] Gautham Gowrishankar commented on TIKA-1479: Hy Guys, I am continuing on the

[jira] [Closed] (TIKA-100) Structured PDF parsing

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-100. Resolution: Fixed Structured PDF parsing -- Key: TIKA-100

[jira] [Commented] (TIKA-89) Rename MimeType and MimeTypes

2015-02-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341976#comment-14341976 ] Tyler Palsulich commented on TIKA-89: - Is there still interest in renaming these?