[jira] [Resolved] (TIKA-2006) Add mime detection for vCalendar and iCalendar

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2006. --- Resolution: Fixed Fix Version/s: 1.14 2.0 > Add mime detection for vCalendar

[jira] [Updated] (TIKA-2007) Tika 1.13 uses vulnerable version of jackson-core: CVE-2016-3720

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2007: -- Priority: Blocker (was: Major) > Tika 1.13 uses vulnerable version of jackson-core: CVE-2016-3720 >

[jira] [Resolved] (TIKA-2005) Add mime detection for vcard

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2005. --- Resolution: Not A Problem Exists now; but apparently didn't when I ran the comparisons btwn Tika and

[jira] [Commented] (TIKA-2004) Add mime detection for Windows Media Metafile, PRONOM: application/x-puid-fmt-584

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331582#comment-15331582 ] Tim Allison commented on TIKA-2004: --- It looks like ".asx" is currently detected as "video/x-ms-asf".

[jira] [Resolved] (TIKA-2004) Add mime detection for Windows Media Metafile, PRONOM: application/x-puid-fmt-584

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2004. --- Resolution: Fixed Fix Version/s: 1.14 2.0 Thank you, [~gagravarr]! > Add

[jira] [Created] (TIKA-2009) Add magic for djvu

2016-06-15 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2009: - Summary: Add magic for djvu Key: TIKA-2009 URL: https://issues.apache.org/jira/browse/TIKA-2009 Project: Tika Issue Type: Improvement Reporter: Tim

[jira] [Resolved] (TIKA-2009) Add magic for djvu

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2009. --- Resolution: Fixed Fix Version/s: 1.14 2.0 > Add magic for djvu >

[jira] [Commented] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332164#comment-15332164 ] Tim Allison commented on TIKA-1986: --- bq. Ah! I missed that part. Please suggest what to do in that case?

[jira] [Comment Edited] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-06-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328074#comment-15328074 ] Tim Allison edited comment on TIKA-1986 at 6/13/16 7:59 PM: bq. Those

[jira] [Commented] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-06-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328074#comment-15328074 ] Tim Allison commented on TIKA-1986: --- bq. Those annotations will go on individual setters/fields. example

[jira] [Commented] (TIKA-1999) org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getPrefix(ToXMLContentHandler.java:58)

2016-06-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320577#comment-15320577 ] Tim Allison commented on TIKA-1999: --- As a temporary workaround, you can increase your stack size: -Xss4m.

[jira] [Commented] (TIKA-1990) Broken .jpg inline image from .pdf files

2016-05-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305682#comment-15305682 ] Tim Allison commented on TIKA-1990: --- Will take a look on Tuesday. Thank you for opening this issue and

[jira] [Assigned] (TIKA-1990) Broken .jpg inline image from .pdf files

2016-05-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1990: - Assignee: Tim Allison > Broken .jpg inline image from .pdf files >

[jira] [Commented] (TIKA-1991) Incorporate latest version of bouncy castle library

2016-05-31 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308835#comment-15308835 ] Tim Allison commented on TIKA-1991: --- I'm likely missing something, but 1.45 is the latest, no? I don't

[jira] [Commented] (TIKA-1991) Incorporate latest version of bouncy castle library

2016-05-31 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308846#comment-15308846 ] Tim Allison commented on TIKA-1991: --- Wait. Sorry. We're at 1.54 in 1.13 and trunk. > Incorporate latest

[jira] [Commented] (TIKA-1978) Invocation of java.net.URL.equals(Object), which blocks to do domain name resolution, in org.apache.tika.parser.geo.topic.GeoParser.initialize(URL)

2016-05-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302240#comment-15302240 ] Tim Allison commented on TIKA-1978: --- [~lewismc], mind making this fix on the 2.x branch if you haven't

[jira] [Commented] (TIKA-1508) Add uniformity to parser parameter configuration

2016-05-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304047#comment-15304047 ] Tim Allison commented on TIKA-1508: --- bq. Whats our take on multivalued params (aka Arrays of integers

[jira] [Commented] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-05-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304061#comment-15304061 ] Tim Allison commented on TIKA-1986: --- The other major thing...I realize looking back at our discussion.

[jira] [Commented] (TIKA-1508) Add uniformity to parser parameter configuration

2016-05-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304051#comment-15304051 ] Tim Allison commented on TIKA-1508: --- Thank you for getting this conversation going again! > Add

[jira] [Commented] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-05-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304041#comment-15304041 ] Tim Allison commented on TIKA-1986: --- Wow. This will be so cool. I made one trivial recommendation on

[jira] [Resolved] (TIKA-1991) Incorporate latest version of bouncy castle library

2016-05-31 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1991. --- Resolution: Not A Problem We're up to date in 1.13 and trunk. > Incorporate latest version of bouncy

[jira] [Commented] (TIKA-1991) Incorporate latest version of bouncy castle library

2016-05-31 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308948#comment-15308948 ] Tim Allison commented on TIKA-1991: --- Oh, ha. That's a vintage version. Make sure to upgrade

[jira] [Commented] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-05-31 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308952#comment-15308952 ] Tim Allison commented on TIKA-1986: --- Unless there are objections, let's create a dev branch (tika-1986?)

[jira] [Assigned] (TIKA-1994) Integrate OCR with PDFParser

2016-06-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1994: - Assignee: Tim Allison > Integrate OCR with PDFParser > > >

[jira] [Created] (TIKA-1994) Integrate OCR with PDFParser

2016-06-02 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1994: - Summary: Integrate OCR with PDFParser Key: TIKA-1994 URL: https://issues.apache.org/jira/browse/TIKA-1994 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-1994) Integrate OCR with PDFParser

2016-06-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1994: -- Description: Users can now run OCR on individual images embedded inline in PDFs if they get the

[jira] [Commented] (TIKA-1994) Integrate OCR with PDFParser

2016-06-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312543#comment-15312543 ] Tim Allison commented on TIKA-1994: --- Pushed to trunk. Still have to integrate with 2.x. I made some

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2016-05-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302076#comment-15302076 ] Tim Allison commented on TIKA-1513: --- [~iryndin], would you mind if we added your test files (tir_im.dbf,

[jira] [Created] (TIKA-1992) Check for duplicate inline images via COSStream not name in PDFParser

2016-06-01 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1992: - Summary: Check for duplicate inline images via COSStream not name in PDFParser Key: TIKA-1992 URL: https://issues.apache.org/jira/browse/TIKA-1992 Project: Tika

[jira] [Commented] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-06-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311544#comment-15311544 ] Tim Allison commented on TIKA-1986: --- Great! Let's move development to the TIKA-1508 branch in asf's git?

[jira] [Commented] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-06-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311642#comment-15311642 ] Tim Allison commented on TIKA-1986: --- I created the branch. Have at it! > support parser parameters with

[jira] [Resolved] (TIKA-1990) Broken .jpg inline image from .pdf files

2016-05-31 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1990. --- Resolution: Fixed Fix Version/s: 1.14 2.0 Needed to add JPEG filters when

[jira] [Resolved] (TIKA-2008) Add mime detection (and parser?) for MSOffice Owner File (PRONOM fmt/473)

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2008. --- Resolution: Fixed Fix Version/s: 1.14 2.0 There's some room for improvement

[jira] [Created] (TIKA-2011) Add mime detection for Endnote Import File (PRONOM: fmt/328)

2016-06-15 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2011: - Summary: Add mime detection for Endnote Import File (PRONOM: fmt/328) Key: TIKA-2011 URL: https://issues.apache.org/jira/browse/TIKA-2011 Project: Tika Issue

[jira] [Resolved] (TIKA-2011) Add mime detection for Endnote Import File (PRONOM: fmt/328)

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2011. --- Resolution: Fixed Fix Version/s: 1.14 2.0 > Add mime detection for Endnote

[jira] [Commented] (TIKA-1358) Add support for newer iWork file formats

2016-06-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344006#comment-15344006 ] Tim Allison commented on TIKA-1358: --- https://github.com/evernote/iwana/issues/1#issuecomment-227343502

[jira] [Commented] (TIKA-1358) Add support for newer iWork file formats

2016-06-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344346#comment-15344346 ] Tim Allison commented on TIKA-1358: --- What mime do we want to use for these?

[jira] [Created] (TIKA-2013) Upgrade to POI 3.15-beta2 when available

2016-06-16 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2013: - Summary: Upgrade to POI 3.15-beta2 when available Key: TIKA-2013 URL: https://issues.apache.org/jira/browse/TIKA-2013 Project: Tika Issue Type: Improvement

[jira] [Comment Edited] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332209#comment-15332209 ] Tim Allison edited comment on TIKA-1986 at 6/15/16 6:57 PM: >From Nick bq. > I

[jira] [Commented] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332209#comment-15332209 ] Tim Allison commented on TIKA-1986: --- >From Nick bq. > I think that's exactly what ParseContext should be

[jira] [Commented] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332326#comment-15332326 ] Tim Allison commented on TIKA-1986: --- Ok, I just reverted ParseContext to what it is in trunk, and I added

[jira] [Reopened] (TIKA-995) XHTMLContentHandler doesn't pass attributes of body element

2016-06-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-995: -- Assignee: (was: Tyler Palsulich) This leads to doubling of the body tag in the output of HTMLParser

[jira] [Comment Edited] (TIKA-995) XHTMLContentHandler doesn't pass attributes of body element

2016-06-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336616#comment-15336616 ] Tim Allison edited comment on TIKA-995 at 6/17/16 6:22 PM: --- This leads to doubling

[jira] [Commented] (TIKA-1358) Add support for newer iWork file formats

2016-06-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345021#comment-15345021 ] Tim Allison commented on TIKA-1358: --- Y, very different. Done. Thank you! > Add support for newer iWork

[jira] [Commented] (TIKA-1358) Add support for newer iWork file formats

2016-06-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345031#comment-15345031 ] Tim Allison commented on TIKA-1358: --- Before I forget...I figured out that we can tell the diff btwn a

[jira] [Commented] (TIKA-1836) Convertion DOC->TXT failed due to POI issue

2016-06-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15339446#comment-15339446 ] Tim Allison commented on TIKA-1836: --- [~richa], any updates on this? Still failing? > Convertion

[jira] [Commented] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332585#comment-15332585 ] Tim Allison commented on TIKA-1986: --- The honor is all yours! I think this is ready for trunk. Probably

[jira] [Comment Edited] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332552#comment-15332552 ] Tim Allison edited comment on TIKA-1986 at 6/15/16 9:02 PM: Doh. Sorry.

[jira] [Commented] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332552#comment-15332552 ] Tim Allison commented on TIKA-1986: --- Doh. Sorry. Didn't anticipate that. Isn't collaboration a good

[jira] [Commented] (TIKA-1986) support parser parameters with type (int, double, etc) in configuration XML file

2016-06-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332782#comment-15332782 ] Tim Allison commented on TIKA-1986: --- Sounds good. Go for it! > support parser parameters with type (int,

[jira] [Updated] (TIKA-2019) WordMLParser and SpreadsheetMLParser incorrectly concatenate tokens with ToTextHandler

2016-06-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2019: -- Description: The xml generated by these parsers was good, but when using the ToTextHandler, spaces/tabs

[jira] [Resolved] (TIKA-2019) WordMLParser and SpreadsheetMLParser incorrectly concatenate tokens with ToTextHandler

2016-06-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2019. --- Resolution: Fixed Fix Version/s: 1.14 2.0 > WordMLParser and

[jira] [Created] (TIKA-2020) Tika 2.0 - remove AbstractParser

2016-06-24 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2020: - Summary: Tika 2.0 - remove AbstractParser Key: TIKA-2020 URL: https://issues.apache.org/jira/browse/TIKA-2020 Project: Tika Issue Type: Task Reporter:

[jira] [Commented] (TIKA-2018) Attempt to get Title from Full text if not present in MetaData ( Application/Pdf )

2016-06-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348286#comment-15348286 ] Tim Allison commented on TIKA-2018: --- bq. A vast majority of pdf documents don't fill meta information.

[jira] [Created] (TIKA-2019) WordMLParser and SpreadsheetMLParser incorrectly concatenate tokens with ToTextHandler

2016-06-24 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2019: - Summary: WordMLParser and SpreadsheetMLParser incorrectly concatenate tokens with ToTextHandler Key: TIKA-2019 URL: https://issues.apache.org/jira/browse/TIKA-2019

[jira] [Updated] (TIKA-2017) Tika Server Cannot handle large files; add option for metadata only

2016-06-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2017: -- Summary: Tika Server Cannot handle large files; add option for metadata only (was: Tika Server Cannot

[jira] [Comment Edited] (TIKA-2017) Tika Server Cannot handle large files

2016-06-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348153#comment-15348153 ] Tim Allison edited comment on TIKA-2017 at 6/24/16 11:18 AM: - I thought I had

[jira] [Commented] (TIKA-2017) Tika Server Cannot handle large files

2016-06-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348153#comment-15348153 ] Tim Allison commented on TIKA-2017: --- I thought I had documented this on our wiki, but it isn't there now.

[jira] [Updated] (TIKA-2020) Tika 2.0 - remove AbstractParser's 3 parameter parse

2016-06-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2020: -- Summary: Tika 2.0 - remove AbstractParser's 3 parameter parse (was: Tika 2.0 - remove AbstractParser)

[jira] [Resolved] (TIKA-2020) Tika 2.0 - remove AbstractParser's 3 parameter parse

2016-06-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2020. --- Resolution: Fixed I initially thought we could remove the AbstractParser entirely, but that contains

[jira] [Updated] (TIKA-2020) Tika 2.0 - remove AbstractParser's 3 parameter parse

2016-06-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2020: -- Fix Version/s: 2.0 > Tika 2.0 - remove AbstractParser's 3 parameter parse >

[jira] [Updated] (TIKA-2020) Tika 2.0 - remove AbstractParser's 3 parameter parse

2016-06-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2020: -- Description: If I understand correctly, AbstractParser was added to allow an easier transition from the

[jira] [Commented] (TIKA-2018) Attempt to get Title from Full text if not present in MetaData ( Application/Pdf )

2016-06-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348551#comment-15348551 ] Tim Allison commented on TIKA-2018: --- I'm not against implementing some basic heuristics based on font

[jira] [Resolved] (TIKA-1826) tika-app gui should print all metadata values, not just the first

2016-01-11 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1826. --- Resolution: Fixed In the gui, we're calling get() on the metadata object which only returns the first

[jira] [Commented] (TIKA-1816) Lenient testing for NamedEntityParser

2016-01-11 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091932#comment-15091932 ] Tim Allison commented on TIKA-1816: --- Y. Works. Thank you! {noformat} Using the first Proxy setting :

[jira] [Commented] (TIKA-1816) Lenient testing for NamedEntityParser

2016-01-11 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091945#comment-15091945 ] Tim Allison commented on TIKA-1816: --- Will commit in the next few minutes, unless Chris wants to? >

[jira] [Created] (TIKA-1828) Upgrade to POI 3.14-Beta2 when available

2016-01-08 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1828: - Summary: Upgrade to POI 3.14-Beta2 when available Key: TIKA-1828 URL: https://issues.apache.org/jira/browse/TIKA-1828 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-1827) Error is printed on stderr when parsing some ppt files

2016-01-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089535#comment-15089535 ] Tim Allison commented on TIKA-1827: --- Thank you for raising this. I just fixed this over on POI. The fix

[jira] [Commented] (TIKA-1816) Lenient testing for NamedEntityParser

2016-01-11 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092003#comment-15092003 ] Tim Allison commented on TIKA-1816: --- committed in trunk r1724034. need to rework ever so slightly to

[jira] [Comment Edited] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098465#comment-15098465 ] Tim Allison edited comment on TIKA-1830 at 1/14/16 5:50 PM: Y, 074531.pdf has

[jira] [Commented] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098515#comment-15098515 ] Tim Allison commented on TIKA-1830: --- Doh. Right. Thank you. > Upgrade to PDFBox 1.8.11 when available >

[jira] [Comment Edited] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098465#comment-15098465 ] Tim Allison edited comment on TIKA-1830 at 1/14/16 5:37 PM: Y, 074531.pdf has

[jira] [Commented] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098465#comment-15098465 ] Tim Allison commented on TIKA-1830: --- Y, 074531.pdf has uncovered a Tika issue. I can reproduce the

[jira] [Comment Edited] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096229#comment-15096229 ] Tim Allison edited comment on TIKA-1830 at 1/13/16 3:40 PM: Reports on

[jira] [Commented] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096506#comment-15096506 ] Tim Allison commented on TIKA-1830: --- [~thetaphi], good to know. Thank you! Speaking of integration with

[jira] [Updated] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1830: -- Priority: Major (was: Minor) > Upgrade to PDFBox 1.8.11 when available >

[jira] [Commented] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098112#comment-15098112 ] Tim Allison commented on TIKA-1830: --- Argh...I'll rerun the 1.8.10 batch and see what we get. > Upgrade

[jira] [Commented] (TIKA-1833) NoClassDefFoundError for POIXMLTypeLoader

2016-01-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103984#comment-15103984 ] Tim Allison commented on TIKA-1833: --- We do fairly extensive testing w the app. This is surprising

[jira] [Commented] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098427#comment-15098427 ] Tim Allison commented on TIKA-1830: --- I just tested casting a null object that started life as a null

[jira] [Commented] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-01-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098393#comment-15098393 ] Tim Allison commented on TIKA-1830: --- Finished the rerun...and the results look the same. Question: On

[jira] [Resolved] (TIKA-2022) Add applefile parser

2016-06-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2022. --- Resolution: Fixed Fix Version/s: 1.14 2.0 > Add applefile parser >

[jira] [Resolved] (TIKA-1644) Mime type diffs between 1.8 and 1.9-rc1

2016-06-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1644. --- Resolution: Fixed > Mime type diffs between 1.8 and 1.9-rc1 > ---

[jira] [Created] (TIKA-2024) Extract original filename/path when possible

2016-06-27 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2024: - Summary: Extract original filename/path when possible Key: TIKA-2024 URL: https://issues.apache.org/jira/browse/TIKA-2024 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-2023) Clean up RTFParser to use EndianUtils when extracting embedded objects

2016-06-27 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2023: - Summary: Clean up RTFParser to use EndianUtils when extracting embedded objects Key: TIKA-2023 URL: https://issues.apache.org/jira/browse/TIKA-2023 Project: Tika

[jira] [Commented] (TIKA-2017) Tika Server Cannot handle large files; add option for metadata only

2016-06-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351398#comment-15351398 ] Tim Allison commented on TIKA-2017: --- [~hmanjuna], all set? Can we close this? > Tika Server Cannot

[jira] [Updated] (TIKA-2023) Clean up RTFParser to use EndianUtils when extracting embedded objects

2016-06-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2023: -- Description: Let's clean up the RTFParser to use EndianUtils for reading ints/longs. While we're at it,

[jira] [Comment Edited] (TIKA-1715) Save embedded images into another location

2016-06-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351408#comment-15351408 ] Tim Allison edited comment on TIKA-1715 at 6/27/16 5:05 PM: [~damiano], can we

[jira] [Commented] (TIKA-1715) Save embedded images into another location

2016-06-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351408#comment-15351408 ] Tim Allison commented on TIKA-1715: --- @damiano, can we close this? > Save embedded images into another

[jira] [Assigned] (TIKA-2025) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.13 doesn’t yield the expected results

2016-06-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-2025: - Assignee: Tim Allison > Extraction of long sequences of digits from Excel spreadsheets using Tika

[jira] [Created] (TIKA-2022) Add applefile parser

2016-06-25 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2022: - Summary: Add applefile parser Key: TIKA-2022 URL: https://issues.apache.org/jira/browse/TIKA-2022 Project: Tika Issue Type: Improvement Reporter: Tim

[jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2016-02-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137097#comment-15137097 ] Tim Allison commented on TIKA-741: -- How are you replicating this with 1.11? I'm not able to replicate this

[jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2016-02-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137110#comment-15137110 ] Tim Allison commented on TIKA-741: -- I'd recommend adding the following to your EnhancedPDF2XHTML:

[jira] [Comment Edited] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2016-02-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137110#comment-15137110 ] Tim Allison edited comment on TIKA-741 at 2/8/16 3:55 PM: -- I'd recommend adding the

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141730#comment-15141730 ] Tim Allison commented on TIKA-1851: --- They'll be packaged along with the each tika-parser-x-module's

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137031#comment-15137031 ] Tim Allison commented on TIKA-1851: --- Y, thank you, [~bobpaulin]! > Tika 2.0 - Move test resources from

[jira] [Comment Edited] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135544#comment-15135544 ] Tim Allison edited comment on TIKA-1851 at 2/8/16 3:08 PM: --- So, we're zipping

[jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2016-02-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137519#comment-15137519 ] Tim Allison commented on TIKA-741: -- bq. Many thanks! I'll upload the fix on our end when I get a chance.

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134502#comment-15134502 ] Tim Allison commented on TIKA-1851: --- Back to normal-ish exceptions. Thank you, [~bobpaulin]! I'll take

[jira] [Commented] (TIKA-1824) Tika 2.0 - Create Initial Parser Modules

2016-02-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132503#comment-15132503 ] Tim Allison commented on TIKA-1824: --- bq. Thanks so much for the feedback, these are great things to be

[jira] [Created] (TIKA-1853) Upgrade to POI 3.14-final when available

2016-02-04 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1853: - Summary: Upgrade to POI 3.14-final when available Key: TIKA-1853 URL: https://issues.apache.org/jira/browse/TIKA-1853 Project: Tika Issue Type: Improvement

<    8   9   10   11   12   13   14   15   16   17   >