[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132549#comment-15132549 ] Tim Allison commented on TIKA-1851: --- [~bobpaulin], any chance you could look into why we've been getting

[jira] [Resolved] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1851. --- Resolution: Fixed > Tika 2.0 - Move test resources from core to test-resources >

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132593#comment-15132593 ] Tim Allison commented on TIKA-1851: --- Dunno, but I should have mentioned that I'm getting this when I try

[jira] [Reopened] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-1851: --- wrong reason for resolving...need to fix > Tika 2.0 - Move test resources from core to test-resources >

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2016-02-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132613#comment-15132613 ] Tim Allison commented on TIKA-1723: --- Agreed on the ease of building the new ld framework in 2.0. Given

[jira] [Comment Edited] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-12 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144903#comment-15144903 ] Tim Allison edited comment on TIKA-1851 at 2/12/16 5:30 PM: bq. as long as we

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-12 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144903#comment-15144903 ] Tim Allison commented on TIKA-1851: --- bq. as long as we are not shipping test documents around with code

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-12 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145150#comment-15145150 ] Tim Allison commented on TIKA-1851: --- If you beat me to it, please do. I don't think I'll be able to work

[jira] [Created] (TIKA-1855) TIka 2.0 - Move shared test-code back to tika-core and distribute test files to parser modules

2016-02-12 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1855: - Summary: TIka 2.0 - Move shared test-code back to tika-core and distribute test files to parser modules Key: TIKA-1855 URL: https://issues.apache.org/jira/browse/TIKA-1855

[jira] [Resolved] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-12 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1851. --- Resolution: Won't Fix Change course. See discussion below. Opened TIKA-1855 to track the new path.

[jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2016-02-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140211#comment-15140211 ] Tim Allison commented on TIKA-741: -- {noformat} org.apache.tika tika-core

[jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2016-02-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140227#comment-15140227 ] Tim Allison commented on TIKA-741: -- If the modification is really PDFBox 2.0.0 specific (how to load a file

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-11 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142695#comment-15142695 ] Tim Allison commented on TIKA-1851: --- Y. Thought that was a crazy question but wanted to make sure I

[jira] [Reopened] (TIKA-1816) Lenient testing for NamedEntityParser

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-1816: --- Assignee: (was: Tim Allison) Reopening until this works in 2.x. > Lenient testing for

[jira] [Commented] (TIKA-1854) Include the storage class ID of documents embedded in MS Office documents

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134083#comment-15134083 ] Tim Allison commented on TIKA-1854: --- Will commit shortly. Thank you for the patch and test case! First,

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134070#comment-15134070 ] Tim Allison commented on TIKA-1851: --- Hi Ken, [~thammegowda] is working on making the opennlp test suite

[jira] [Assigned] (TIKA-1854) Include the storage class ID of documents embedded in MS Office documents

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1854: - Assignee: Tim Allison > Include the storage class ID of documents embedded in MS Office documents

[jira] [Resolved] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1851. --- Resolution: Invalid Moved shared test resources to test-resources and did some other very small test

[jira] [Commented] (TIKA-1824) Tika 2.0 - Create Initial Parser Modules

2016-02-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132507#comment-15132507 ] Tim Allison commented on TIKA-1824: --- Sorry, [~grossws], [~thaichat04] and [~lfcnassif] should have

[jira] [Resolved] (TIKA-1854) Include the storage class ID of documents embedded in MS Office documents

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1854. --- Resolution: Fixed Committed in trunk and 2.x with small mods. Thank you! > Include the storage class

[jira] [Comment Edited] (TIKA-1854) Include the storage class ID of documents embedded in MS Office documents

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134083#comment-15134083 ] Tim Allison edited comment on TIKA-1854 at 2/5/16 1:37 PM: --- Will commit shortly.

[jira] [Created] (TIKA-1852) Tika 2.0 - clean up unit tests to rely more on TikaTest

2016-02-04 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1852: - Summary: Tika 2.0 - clean up unit tests to rely more on TikaTest Key: TIKA-1852 URL: https://issues.apache.org/jira/browse/TIKA-1852 Project: Tika Issue Type:

[jira] [Comment Edited] (TIKA-1854) Include the storage class ID of documents embedded in MS Office documents

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134083#comment-15134083 ] Tim Allison edited comment on TIKA-1854 at 2/5/16 1:13 PM: --- Will commit shortly.

[jira] [Commented] (TIKA-1836) Convertion DOC->TXT failed due to POI issue

2016-02-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132895#comment-15132895 ] Tim Allison commented on TIKA-1836: --- Committed workaround to log rather than throw an exception in POI

[jira] [Commented] (TIKA-1854) Include the storage class ID of documents embedded in MS Office documents

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134252#comment-15134252 ] Tim Allison commented on TIKA-1854: --- Got it. This is very helpful. Thank you. bq. Is the same

[jira] [Comment Edited] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135480#comment-15135480 ] Tim Allison edited comment on TIKA-1851 at 2/6/16 2:35 AM: --- Build now works

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135480#comment-15135480 ] Tim Allison commented on TIKA-1851: --- Build now works locally for me after manual download of ner models.

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135506#comment-15135506 ] Tim Allison commented on TIKA-1851: --- Ah, ok, thank you. Y, I was inclined to put the test resources in

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135514#comment-15135514 ] Tim Allison commented on TIKA-1851: --- K. Moving everything back to test now. > Tika 2.0 - Move test

[jira] [Reopened] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-1851: --- https://builds.apache.org/job/tika-2.x/20/#showFailuresLink For these two: {noformat}

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135490#comment-15135490 ] Tim Allison commented on TIKA-1851: --- Apologies for the following display of ignorance...but I _think_ the

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135526#comment-15135526 ] Tim Allison commented on TIKA-1851: --- Looks like I have to move the MockParserTest out of the

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135544#comment-15135544 ] Tim Allison commented on TIKA-1851: --- So, we're zipping _all_ the test files into a jar, and then each

[jira] [Comment Edited] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135544#comment-15135544 ] Tim Allison edited comment on TIKA-1851 at 2/6/16 3:52 AM: --- So, we're zipping

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135550#comment-15135550 ] Tim Allison commented on TIKA-1851: --- K. Just pushed the "undo" moving all tika-test-resources back to

[jira] [Updated] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1851: -- Attachment: tika_2x_test_files_and_modules.xlsx I'm attaching the output of a bit of hackery to find

[jira] [Commented] (TIKA-1841) Different XML output structure for PPT and PPTX

2016-02-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128925#comment-15128925 ] Tim Allison commented on TIKA-1841: --- I'm just about to commit and push 3.14-beta1 (TIKA-1799). Might

[jira] [Reopened] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-02-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-1830: --- Actually upgrade in Tika 2.x branch > Upgrade to PDFBox 1.8.11 when available >

[jira] [Resolved] (TIKA-1799) Upgrade to POI 3.14-Beta1 when available

2016-02-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1799. --- Resolution: Fixed Upgraded in both trunk and 2.x branch. Thank you, [~kiwiwings] and [~bobpaulin]!

[jira] [Resolved] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-02-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1830. --- Resolution: Fixed > Upgrade to PDFBox 1.8.11 when available > ---

[jira] [Created] (TIKA-1847) Clean up parser versions variables in 2.x

2016-02-02 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1847: - Summary: Clean up parser versions variables in 2.x Key: TIKA-1847 URL: https://issues.apache.org/jira/browse/TIKA-1847 Project: Tika Issue Type: Task

[jira] [Updated] (TIKA-1847) Clean up parser version parameters in 2.x

2016-02-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1847: -- Summary: Clean up parser version parameters in 2.x (was: Clean up parser versions variables in 2.x) >

[jira] [Commented] (TIKA-1848) Address issues with Tika 1.12rc#1

2016-02-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129480#comment-15129480 ] Tim Allison commented on TIKA-1848: --- Thank you for running DRAT! I think we're ok with CharsetDetector

[jira] [Created] (TIKA-1844) PooledTimeSeriesParser takes precedence over MP4Parser

2016-01-29 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1844: - Summary: PooledTimeSeriesParser takes precedence over MP4Parser Key: TIKA-1844 URL: https://issues.apache.org/jira/browse/TIKA-1844 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126479#comment-15126479 ] Tim Allison commented on TIKA-1845: --- There are two problems that this file reveals. 1) The

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126506#comment-15126506 ] Tim Allison commented on TIKA-1845: --- my failure on TIKA-1010 to set mime correctly. > Unable to extract

[jira] [Assigned] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1845: - Assignee: Tim Allison > Unable to extract content from certain RTFs using tika-server versions

[jira] [Resolved] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1830. --- Resolution: Fixed Assignee: Tim Allison [~thetaphi], I'm sorry I didn't get this into 1.12. I'd

[jira] [Updated] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1830: -- Fix Version/s: 1.13 > Upgrade to PDFBox 1.8.11 when available > ---

[jira] [Commented] (TIKA-1846) Set up Hudson (or similar?) with new Git repo

2016-02-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130321#comment-15130321 ] Tim Allison commented on TIKA-1846: --- Thank you! > Set up Hudson (or similar?) with new Git repo >

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2016-02-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130360#comment-15130360 ] Tim Allison commented on TIKA-1723: --- Come on over to the 2.x branch, the water is fine. :) Plenty of

[jira] [Commented] (TIKA-1848) Address issues with Tika 1.12rc#1

2016-02-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130400#comment-15130400 ] Tim Allison commented on TIKA-1848: --- I tested adding headers, and they don't break our tests with the

[jira] [Assigned] (TIKA-1847) Clean up parser version parameters in 2.x

2016-02-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1847: - Assignee: Tim Allison > Clean up parser version parameters in 2.x >

[jira] [Created] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-03 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1851: - Summary: Tika 2.0 - Move test resources from core to test-resources Key: TIKA-1851 URL: https://issues.apache.org/jira/browse/TIKA-1851 Project: Tika Issue Type:

[jira] [Comment Edited] (TIKA-1824) Tika 2.0 - Create Initial Parser Modules

2016-02-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131500#comment-15131500 ] Tim Allison edited comment on TIKA-1824 at 2/4/16 1:21 AM: --- bq. Perhaps add

[jira] [Commented] (TIKA-1824) Tika 2.0 - Create Initial Parser Modules

2016-02-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131500#comment-15131500 ] Tim Allison commented on TIKA-1824: --- bq. Perhaps add "parser(s?) to the artifactId Y, sorry,

[jira] [Updated] (TIKA-1847) Tika 2.0 - Clean up tika-parsers pom dependencies and a few other things

2016-02-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1847: -- Summary: Tika 2.0 - Clean up tika-parsers pom dependencies and a few other things (was: Clean up parser

[jira] [Resolved] (TIKA-1847) Tika 2.0 - Clean up tika-parsers pom dependencies and a few other things

2016-02-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1847. --- Resolution: Fixed > Tika 2.0 - Clean up tika-parsers pom dependencies and a few other things >

[jira] [Resolved] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1845. --- Resolution: Fixed in both trunk and 2.x Thank you for raising this and supplying a test file! >

[jira] [Commented] (TIKA-1816) Lenient testing for NamedEntityParser

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127009#comment-15127009 ] Tim Allison commented on TIKA-1816: --- [~thammegowda], if you have a chance, would you be willing to try

[jira] [Created] (TIKA-1846) Set up Hudson (or similar?) with new Git repo

2016-02-02 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1846: - Summary: Set up Hudson (or similar?) with new Git repo Key: TIKA-1846 URL: https://issues.apache.org/jira/browse/TIKA-1846 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-1816) Lenient testing for NamedEntityParser

2016-02-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128261#comment-15128261 ] Tim Allison commented on TIKA-1816: --- 2.x is still very much in flux! If you We/[~bobpaulin] initially

[jira] [Commented] (TIKA-1848) Address issues with Tika 1.12rc#1

2016-02-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130325#comment-15130325 ] Tim Allison commented on TIKA-1848: --- So, um, I'll try to fix these in trunk. Do we need an rc2 where

[jira] [Updated] (TIKA-1847) Clean up parser version parameters in 2.x

2016-02-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1847: -- Issue Type: Sub-task (was: Task) Parent: TIKA-1824 > Clean up parser version parameters in 2.x

[jira] [Commented] (TIKA-1849) RTF Exception

2016-02-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131036#comment-15131036 ] Tim Allison commented on TIKA-1849: --- I'm not able to reproduce this in our test suite. To confirm, this

[jira] [Comment Edited] (TIKA-1849) RTF Exception

2016-02-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131036#comment-15131036 ] Tim Allison edited comment on TIKA-1849 at 2/3/16 8:06 PM: --- I'm not able to

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126320#comment-15126320 ] Tim Allison commented on TIKA-1845: --- >From the stacktrace, this looks to be related to TIKA-1010. Will

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126366#comment-15126366 ] Tim Allison commented on TIKA-1845: --- Scooped it from evernote. Let me know if I should srm it. > Unable

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126429#comment-15126429 ] Tim Allison commented on TIKA-1845: --- Looks like there is no trouble with the tika-app with straight

[jira] [Resolved] (TIKA-1863) --text-main content missing in output file

2016-02-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1863. --- Resolution: Won't Fix {{--text-main}} uses the {{BoilerpipeContentHandler}}, which tries to determine

[jira] [Commented] (TIKA-1863) --text-main content missing in output file

2016-02-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156876#comment-15156876 ] Tim Allison commented on TIKA-1863: --- Ah, ok. The pdfbox

[jira] [Commented] (TIKA-1855) TIka 2.0 - Move shared test-code back to tika-core and distribute test files to parser modules

2016-02-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156890#comment-15156890 ] Tim Allison commented on TIKA-1855: --- Y. Agreed. I think so. To confirm, when you say "content just

[jira] [Commented] (TIKA-1863) --text-main content missing in output file

2016-02-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156920#comment-15156920 ] Tim Allison commented on TIKA-1863: --- Ok, y, I'm able to reproduce this with {{--text-main}}, but the text

[jira] [Comment Edited] (TIKA-1863) --text-main content missing in output file

2016-02-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156876#comment-15156876 ] Tim Allison edited comment on TIKA-1863 at 2/22/16 1:33 PM: Ah, ok. The pdfbox

[jira] [Commented] (TIKA-1866) Out of memory error on Word document

2016-02-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160099#comment-15160099 ] Tim Allison commented on TIKA-1866: --- Not TikaInputStream's fault. This looks to be a bug deep within

[jira] [Commented] (TIKA-1866) Out of memory error on Word document

2016-02-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160022#comment-15160022 ] Tim Allison commented on TIKA-1866: --- Strike that...image handling is not the problem. If I save the

[jira] [Commented] (TIKA-1866) Out of memory error on Word document

2016-02-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158871#comment-15158871 ] Tim Allison commented on TIKA-1866: --- That's exciting. I'll take a look. > Out of memory error on Word

[jira] [Resolved] (TIKA-1874) Fix rare npe in XWPFListManager

2016-02-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1874. --- Resolution: Fixed > Fix rare npe in XWPFListManager > --- > >

[jira] [Created] (TIKA-1874) Fix rare npe in XWPFListManager

2016-02-25 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1874: - Summary: Fix rare npe in XWPFListManager Key: TIKA-1874 URL: https://issues.apache.org/jira/browse/TIKA-1874 Project: Tika Issue Type: Bug Reporter:

[jira] [Updated] (TIKA-1874) Fix rare npe in XWPFListManager

2016-02-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1874: -- Description: Many thanks to [~centic]'s

[jira] [Commented] (TIKA-1855) TIka 2.0 - Move shared test-code back to tika-core and distribute test files to parser modules

2016-02-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166589#comment-15166589 ] Tim Allison commented on TIKA-1855: --- I have to admit that I lost a bit of steam on this... I agree with

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-02-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167208#comment-15167208 ] Tim Allison commented on TIKA-1607: --- Aside from XMP, I can't think of an example where we'd have multiple

[jira] [Comment Edited] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170599#comment-15170599 ] Tim Allison edited comment on TIKA-1865 at 2/29/16 1:39 PM: Outlook shows an

[jira] [Updated] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1865: -- Attachment: report.xlsx I took a dump of .msg files from Common Crawl. Several of the files were

[jira] [Created] (TIKA-1879) Extract recipient information in MSG files with more granularity

2016-02-29 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1879: - Summary: Extract recipient information in MSG files with more granularity Key: TIKA-1879 URL: https://issues.apache.org/jira/browse/TIKA-1879 Project: Tika Issue

[jira] [Commented] (TIKA-1663) Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata

2016-02-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172626#comment-15172626 ] Tim Allison commented on TIKA-1663: --- In tika-batch/tika-app, I did a not-so-great-workaround with an

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172633#comment-15172633 ] Tim Allison commented on TIKA-1865: --- Y, that's my guess exactly. If anyone has actual knowledge or has

[jira] [Comment Edited] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172283#comment-15172283 ] Tim Allison edited comment on TIKA-1865 at 2/29/16 9:55 PM: I took a dump of

[jira] [Commented] (TIKA-1663) Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata

2016-02-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172605#comment-15172605 ] Tim Allison commented on TIKA-1663: --- Y, I much prefer #2. The parameter part will be solved by

[jira] [Issue Comment Deleted] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1865: -- Comment: was deleted (was: Y, that's my guess exactly. If anyone has actual knowledge or has found

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172685#comment-15172685 ] Tim Allison commented on TIKA-1865: --- Y, that's my guess exactly. If anyone has actual knowledge or has

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168951#comment-15168951 ] Tim Allison commented on TIKA-1865: --- And if you are interested in working on a patch for this, we now

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168996#comment-15168996 ] Tim Allison commented on TIKA-1865: --- [~jeremybmerrill], any interest in this? Want to contribute? >

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170599#comment-15170599 ] Tim Allison commented on TIKA-1865: --- Outlook shows part of a name, but no address. Couldn't see address w

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-27 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170601#comment-15170601 ] Tim Allison commented on TIKA-1865: ---

[jira] [Commented] (TIKA-1866) Out of memory error on Word document

2016-02-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158894#comment-15158894 ] Tim Allison commented on TIKA-1866: --- Looks like something in the image handling is causing problems.

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168949#comment-15168949 ] Tim Allison commented on TIKA-1865: --- Thank you, Nick. > Save sender email address in Outlook MSG

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168946#comment-15168946 ] Tim Allison commented on TIKA-1865: --- Yes and yes...any interest in submitting a patch? If you're

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169009#comment-15169009 ] Tim Allison commented on TIKA-1865: --- Completely agree on all counts. Did not mean to suggest breaking

[jira] [Comment Edited] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168945#comment-15168945 ] Tim Allison edited comment on TIKA-1865 at 2/26/16 1:17 PM: With the handful of

<    9   10   11   12   13   14   15   16   17   18   >