[jira] [Updated] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2014-10-26 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1445: -- Attachment: TIKA-1445.Palsulich.102614.patch Here is an updated patch with the above idea. I

[jira] [Resolved] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-24 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1422. --- Resolution: Fixed Fixed in r1634094. Skip over the two failing checks if Tesseract is

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-24 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183383#comment-14183383 ] Tyler Palsulich commented on TIKA-1442: --- Yes, unfortunately. Please see TIKA-1445.

[jira] [Comment Edited] (TIKA-1442) Upgrade to PDFBox 1.8.8

2014-10-24 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183383#comment-14183383 ] Tyler Palsulich edited comment on TIKA-1442 at 10/24/14 8:05 PM:

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2014-10-24 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183873#comment-14183873 ] Tyler Palsulich commented on TIKA-1445: --- I've been trying my hand at this some time

[jira] [Reopened] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-22 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich reopened TIKA-1422: --- Reopening as we're still getting odd test failures. org.apache.tika.parser.mail.RFC822ParserTest

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-21 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178866#comment-14178866 ] Tyler Palsulich commented on TIKA-1422: --- {code} Results : Failed tests:

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-16 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173839#comment-14173839 ] Tyler Palsulich commented on TIKA-1422: --- Can you check what {{%ErrorLevel%}} is when

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-15 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172796#comment-14172796 ] Tyler Palsulich commented on TIKA-1422: --- What version of Tesseract do you have

[jira] [Resolved] (TIKA-1391) Create Parser.parse() example

2014-10-15 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1391. --- Resolution: Fixed Assignee: Tyler Palsulich Have basic parsing and parseToString

[jira] [Commented] (TIKA-605) Tika GDAL parser

2014-10-11 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168230#comment-14168230 ] Tyler Palsulich commented on TIKA-605: -- See my comments on the RB from a few minutes

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-08 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163704#comment-14163704 ] Tyler Palsulich commented on TIKA-1422: --- With my patch from yesterday, all tests are

[jira] [Commented] (TIKA-1438) PhoneExtractingContentHandler to not add individual MD entries for individual phone numbers

2014-10-08 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164145#comment-14164145 ] Tyler Palsulich commented on TIKA-1438: --- In my opinion, a single multivalued metadata

[jira] [Closed] (TIKA-1438) PhoneExtractingContentHandler to not add individual MD entries for individual phone numbers

2014-10-08 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-1438. - Resolution: Not a Problem PhoneExtractingContentHandler to not add individual MD entries for

[jira] [Updated] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-07 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1422: -- Attachment: TIKA-1422.palsulich.100714.patch I attached a preliminary patch which combines

[jira] [Commented] (TIKA-93) OCR support

2014-10-07 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162577#comment-14162577 ] Tyler Palsulich commented on TIKA-93: - Hi [~twigbranch]. OCR is not currently run on

[jira] [Updated] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-04 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1422: -- Attachment: TIKA-1422.palsulich.100414.patch Here is a patch implementing option 3. All tests

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-02 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157068#comment-14157068 ] Tyler Palsulich commented on TIKA-1422: --- [~chrismattmann], I believe that patch fails

[jira] [Resolved] (TIKA-1420) Add Metadata Extraction to Arbitrary Parsers

2014-09-29 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1420. --- Resolution: Fixed Fix Version/s: 1.7 Assignee: Tyler Palsulich Moved over in

[jira] [Commented] (TIKA-1420) Add Metadata Extraction to Arbitrary Parsers

2014-09-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151232#comment-14151232 ] Tyler Palsulich commented on TIKA-1420: --- Thanks [~gagravarr] and [~chrismattmann].

[jira] [Closed] (TIKA-1240) IncompatibleClassChangeError with - new Tika().parseToString(stream);

2014-09-27 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-1240. - Resolution: Not a Problem Assignee: Tyler Palsulich IncompatibleClassChangeError with -

[jira] [Commented] (TIKA-1239) Using Spring and Tika together. Need to extract the content and metadata.

2014-09-27 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14150616#comment-14150616 ] Tyler Palsulich commented on TIKA-1239: --- Hi [~iyersudhes...@gmail.com]. Did you end

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-09-24 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146523#comment-14146523 ] Tyler Palsulich commented on TIKA-1422: --- The Hudson builds are now stable with the

[jira] [Commented] (TIKA-1420) Add Metadata Extraction to Arbitrary Parsers

2014-09-24 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146775#comment-14146775 ] Tyler Palsulich commented on TIKA-1420: --- Initial example added in r1627397. Add

[jira] [Commented] (TIKA-1420) Add Metadata Extraction to Arbitrary Parsers

2014-09-24 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147272#comment-14147272 ] Tyler Palsulich commented on TIKA-1420: --- Just made some more updates in r1627446. I

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-09-23 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145299#comment-14145299 ] Tyler Palsulich commented on TIKA-1422: --- This assumes that if a user has Tesseract

[jira] [Commented] (TIKA-1420) Add Metadata Extraction to Arbitrary Parsers

2014-09-23 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145699#comment-14145699 ] Tyler Palsulich commented on TIKA-1420: --- [~gagravarr], sounds good! So, would that

[jira] [Comment Edited] (TIKA-1420) Add Metadata Extraction to Arbitrary Parsers

2014-09-23 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145699#comment-14145699 ] Tyler Palsulich edited comment on TIKA-1420 at 9/24/14 12:40 AM:

[jira] [Commented] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

2014-09-22 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143273#comment-14143273 ] Tyler Palsulich commented on TIKA-1421: --- I commented on list, but here is a proposed

[jira] [Commented] (TIKA-1420) Add Metadata Extraction to Arbitrary Parsers

2014-09-20 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142095#comment-14142095 ] Tyler Palsulich commented on TIKA-1420: --- That's exactly what I'm imagining. As an

[jira] [Commented] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

2014-09-20 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142310#comment-14142310 ] Tyler Palsulich commented on TIKA-1421: --- bq. Here's how it fails when Tesseract is

[jira] [Resolved] (TIKA-93) OCR support

2014-09-19 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-93. - Resolution: Fixed Added in r1626226. Thanks [~lfcnassif] and everyone! OCR support ---

[jira] [Created] (TIKA-1420) Add Metadata Extraction to Arbitrary Parsers

2014-09-19 Thread Tyler Palsulich (JIRA)
Tyler Palsulich created TIKA-1420: - Summary: Add Metadata Extraction to Arbitrary Parsers Key: TIKA-1420 URL: https://issues.apache.org/jira/browse/TIKA-1420 Project: Tika Issue Type:

[jira] [Updated] (TIKA-93) OCR support

2014-09-18 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-93: Attachment: TesseractOCR_Tyler_v4.patch Thank you for the input! I attached a new patch (v4) which

[jira] [Created] (TIKA-1416) Refactor Translator Exception Handling

2014-09-15 Thread Tyler Palsulich (JIRA)
Tyler Palsulich created TIKA-1416: - Summary: Refactor Translator Exception Handling Key: TIKA-1416 URL: https://issues.apache.org/jira/browse/TIKA-1416 Project: Tika Issue Type: Bug

[jira] [Assigned] (TIKA-93) OCR support

2014-09-15 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich reassigned TIKA-93: --- Assignee: Tyler Palsulich (was: Chris A. Mattmann) OCR support ---

[jira] [Updated] (TIKA-93) OCR support

2014-09-15 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-93: Assignee: Chris A. Mattmann (was: Tyler Palsulich) OCR support --- Key:

[jira] [Updated] (TIKA-93) OCR support

2014-09-15 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-93: Attachment: TesseractOCR_Tyler_v3.patch Updated patch which passes all tests whether Tesseract is

[jira] [Commented] (TIKA-1414) How to extract embedded images from PDFs?

2014-09-15 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134694#comment-14134694 ] Tyler Palsulich commented on TIKA-1414: --- bq. any interest in adding an example for

[jira] [Created] (TIKA-1417) Create Extract Embedded Images from PDFs Example

2014-09-15 Thread Tyler Palsulich (JIRA)
Tyler Palsulich created TIKA-1417: - Summary: Create Extract Embedded Images from PDFs Example Key: TIKA-1417 URL: https://issues.apache.org/jira/browse/TIKA-1417 Project: Tika Issue Type:

[jira] [Comment Edited] (TIKA-1414) How to extract embedded images from PDFs?

2014-09-15 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134694#comment-14134694 ] Tyler Palsulich edited comment on TIKA-1414 at 9/15/14 11:37 PM:

[jira] [Commented] (TIKA-93) OCR support

2014-09-13 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132881#comment-14132881 ] Tyler Palsulich commented on TIKA-93: - I've started working on this again. It works well

[jira] [Comment Edited] (TIKA-93) OCR support

2014-09-13 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132881#comment-14132881 ] Tyler Palsulich edited comment on TIKA-93 at 9/13/14 6:53 PM: --

[jira] [Commented] (TIKA-1407) Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@5d11346a

2014-09-04 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122003#comment-14122003 ] Tyler Palsulich commented on TIKA-1407: --- Hi [~Neamar]. We're working on releasing 1.6

[jira] [Commented] (TIKA-93) OCR support

2014-08-21 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106087#comment-14106087 ] Tyler Palsulich commented on TIKA-93: - Kevin Slote: Is Tesseract in the trunk? If so

[jira] [Commented] (TIKA-93) OCR support

2014-08-14 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097575#comment-14097575 ] Tyler Palsulich commented on TIKA-93: - Awesome. Thank you, [~yonyonson]! Let me know if

[jira] [Commented] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-13 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095771#comment-14095771 ] Tyler Palsulich commented on TIKA-1387: --- I had these changes locally, but you beat me

[jira] [Commented] (TIKA-1391) Create Parser.parse() example

2014-08-13 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096194#comment-14096194 ] Tyler Palsulich commented on TIKA-1391: --- This patch with a few small updates

[jira] [Comment Edited] (TIKA-1391) Create Parser.parse() example

2014-08-13 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096194#comment-14096194 ] Tyler Palsulich edited comment on TIKA-1391 at 8/13/14 10:00 PM:

[jira] [Closed] (TIKA-1392) Create a LanguageIdentifier example

2014-08-13 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-1392. - Resolution: Fixed Assignee: Tyler Palsulich Added in r1617848. Create a

[jira] [Closed] (TIKA-1393) Create Translator example

2014-08-13 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-1393. - Resolution: Fixed Assignee: Tyler Palsulich Added an example of a MicrosoftTranslator in

[jira] [Updated] (TIKA-1329) Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2014-08-13 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1329: -- Issue Type: Sub-task (was: Improvement) Parent: TIKA-1390 Add RecursiveParserWrapper

[jira] [Commented] (TIKA-93) OCR support

2014-08-12 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094868#comment-14094868 ] Tyler Palsulich commented on TIKA-93: - OCR via Tesseract should be in trunk within a

[jira] [Closed] (TIKA-1394) Create RecursiveMetadata example

2014-08-12 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-1394. - Resolution: Duplicate Assignee: Tyler Palsulich Duplicate of TIKA-1329. Create

[jira] [Updated] (TIKA-1385) Create an External Translator

2014-08-08 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1385: -- Description: We should create an interface similar to ExternalParser which can use a command

[jira] [Commented] (TIKA-1385) Create an External Translator

2014-08-08 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090899#comment-14090899 ] Tyler Palsulich commented on TIKA-1385: --- Review board with an ExternalTranslator and

[jira] [Commented] (TIKA-1390) Create tika-example module

2014-08-08 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090933#comment-14090933 ] Tyler Palsulich commented on TIKA-1390: --- Created the example module in r1616815.

[jira] [Resolved] (TIKA-1385) Create an External Translator

2014-08-08 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1385. --- Resolution: Fixed Fixed in r1616841, after a +1 by [~chrismattmann]. Thanks! Create an

[jira] [Updated] (TIKA-1391) Create Parser.parse() example

2014-08-08 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1391: -- Attachment: TIKA-1391.palsulich.080814.patch Patch with two parsing examples and unit tests for

[jira] [Created] (TIKA-1389) Convert all wildcard imports to explicit imports

2014-08-07 Thread Tyler Palsulich (JIRA)
Tyler Palsulich created TIKA-1389: - Summary: Convert all wildcard imports to explicit imports Key: TIKA-1389 URL: https://issues.apache.org/jira/browse/TIKA-1389 Project: Tika Issue Type:

[jira] [Resolved] (TIKA-1389) Convert all wildcard imports to explicit imports

2014-08-07 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1389. --- Resolution: Fixed Fixed in r1616488. I didn't update the tika-java7, though. Most of these

[jira] [Comment Edited] (TIKA-1389) Convert all wildcard imports to explicit imports

2014-08-07 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089209#comment-14089209 ] Tyler Palsulich edited comment on TIKA-1389 at 8/7/14 1:33 PM:

[jira] [Created] (TIKA-1390) Create tika-example module

2014-08-07 Thread Tyler Palsulich (JIRA)
Tyler Palsulich created TIKA-1390: - Summary: Create tika-example module Key: TIKA-1390 URL: https://issues.apache.org/jira/browse/TIKA-1390 Project: Tika Issue Type: Bug

[jira] [Created] (TIKA-1392) Create a LanguageIdentifier example

2014-08-07 Thread Tyler Palsulich (JIRA)
Tyler Palsulich created TIKA-1392: - Summary: Create a LanguageIdentifier example Key: TIKA-1392 URL: https://issues.apache.org/jira/browse/TIKA-1392 Project: Tika Issue Type: Sub-task

[jira] [Created] (TIKA-1391) Create Parser.parse() example

2014-08-07 Thread Tyler Palsulich (JIRA)
Tyler Palsulich created TIKA-1391: - Summary: Create Parser.parse() example Key: TIKA-1391 URL: https://issues.apache.org/jira/browse/TIKA-1391 Project: Tika Issue Type: Sub-task

[jira] [Created] (TIKA-1393) Create Translator example

2014-08-07 Thread Tyler Palsulich (JIRA)
Tyler Palsulich created TIKA-1393: - Summary: Create Translator example Key: TIKA-1393 URL: https://issues.apache.org/jira/browse/TIKA-1393 Project: Tika Issue Type: Sub-task

[jira] [Created] (TIKA-1394) Create RecursiveMetadata example

2014-08-07 Thread Tyler Palsulich (JIRA)
Tyler Palsulich created TIKA-1394: - Summary: Create RecursiveMetadata example Key: TIKA-1394 URL: https://issues.apache.org/jira/browse/TIKA-1394 Project: Tika Issue Type: Sub-task

[jira] [Created] (TIKA-1395) Create embedded image extraction example

2014-08-07 Thread Tyler Palsulich (JIRA)
Tyler Palsulich created TIKA-1395: - Summary: Create embedded image extraction example Key: TIKA-1395 URL: https://issues.apache.org/jira/browse/TIKA-1395 Project: Tika Issue Type: Sub-task

[jira] [Commented] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087608#comment-14087608 ] Tyler Palsulich commented on TIKA-1387: --- I will take a look at these and fix the

[jira] [Assigned] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich reassigned TIKA-1387: - Assignee: Tyler Palsulich Add forbidden-apis checker to TIKA build

[jira] [Updated] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1387: -- Attachment: TIKA-1387.palsulich.080614.patch Here is a patch which includes the forbiddenapi

[jira] [Resolved] (TIKA-1387) Add forbidden-apis checker to TIKA build

2014-08-06 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1387. --- Resolution: Fixed Fix Version/s: 1.7 Fixed in r1616295. Thanks [~thetaphi]! Add

[jira] [Created] (TIKA-1384) Use tika-parent dependency management for common dependencies

2014-08-05 Thread Tyler Palsulich (JIRA)
Tyler Palsulich created TIKA-1384: - Summary: Use tika-parent dependency management for common dependencies Key: TIKA-1384 URL: https://issues.apache.org/jira/browse/TIKA-1384 Project: Tika

[jira] [Created] (TIKA-1385) Create an External Translator

2014-08-05 Thread Tyler Palsulich (JIRA)
Tyler Palsulich created TIKA-1385: - Summary: Create an External Translator Key: TIKA-1385 URL: https://issues.apache.org/jira/browse/TIKA-1385 Project: Tika Issue Type: Bug

[jira] [Closed] (TIKA-1073) root@localhost# java -jar tika-app-1.3.jar Failed to load Main-Class manifest attribute from tika-app-1.3.jar

2014-08-04 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-1073. - Resolution: Cannot Reproduce Assignee: Tyler Palsulich Closing this, since it seems to be a

[jira] [Commented] (TIKA-1317) Tika does not read text from Headers, Cover Pages, and SDT components of DOCX documents

2014-08-04 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084862#comment-14084862 ] Tyler Palsulich commented on TIKA-1317: --- +1, [~talli...@apache.org]. Tika does not

[jira] [Commented] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-24 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073557#comment-14073557 ] Tyler Palsulich commented on TIKA-1373: --- bq. HtmlParser skips tags generated by

[jira] [Updated] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-23 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1373: -- Description: When using the AutoDetectParser in java code, and the SourceCodeParser is

[jira] [Commented] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-23 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071942#comment-14071942 ] Tyler Palsulich commented on TIKA-1373: --- The only SAX event in SourceCodeParser is

[jira] [Comment Edited] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-23 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071942#comment-14071942 ] Tyler Palsulich edited comment on TIKA-1373 at 7/23/14 4:52 PM:

[jira] [Closed] (TIKA-1050) Charset detection gives wrong results for GB18030 encoding

2014-07-21 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-1050. - Resolution: Cannot Reproduce Fix Version/s: 1.6 Assignee: Tyler Palsulich The

[jira] [Commented] (TIKA-1172) Out Of Memory exception occurring in GUI on 20MB pdf

2014-07-21 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068541#comment-14068541 ] Tyler Palsulich commented on TIKA-1172: --- Hi Erik, Thank you for raising this issue.

[jira] [Commented] (TIKA-1357) Buffered text in EnviHeaderParser

2014-07-21 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068639#comment-14068639 ] Tyler Palsulich commented on TIKA-1357: --- Fixed in r1612316 with unit test and

[jira] [Resolved] (TIKA-1357) Buffered text in EnviHeaderParser

2014-07-21 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1357. --- Resolution: Fixed Fix Version/s: 1.6 Assignee: Tyler Palsulich Buffered text

[jira] [Resolved] (TIKA-1251) RuntimeException when parsing word (.doc) documents. Works in Tika 1.4 but not 1.5

2014-07-21 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1251. --- Resolution: Fixed Fix Version/s: 1.6 Assignee: Tyler Palsulich Fixed in

[jira] [Resolved] (TIKA-411) Generate list of supported and detected types automatically

2014-07-17 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-411. -- Resolution: Fixed Fix Version/s: 1.6 Assignee: Tyler Palsulich Fixed in

[jira] [Resolved] (TIKA-1342) Remove Ambiguous Links in Site

2014-07-17 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1342. --- Resolution: Fixed Assignee: Tyler Palsulich Fixed in r1611403. Remove Ambiguous Links

[jira] [Resolved] (TIKA-1253) SLF4J: The requested version 1.5.6 by your slf4j binding is not compatible with [1.6, 1.7]

2014-07-17 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1253. --- Resolution: Fixed Assignee: Tyler Palsulich Can be closed since we updated the NetCDF

[jira] [Resolved] (TIKA-1105) CompositeParser should use ParseContext when getting correct Parser

2014-07-17 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1105. --- Resolution: Fixed Assignee: Tyler Palsulich Fixed in r1611405. CompositeParser should

[jira] [Resolved] (TIKA-1370) CachedTranslator Implementation

2014-07-17 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1370. --- Resolution: Fixed Fix Version/s: 1.6 Assignee: Tyler Palsulich Added in

[jira] [Commented] (TIKA-1365) Incorrectly MimeType detection for Apache Lucene web site

2014-07-16 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063566#comment-14063566 ] Tyler Palsulich commented on TIKA-1365: --- Ah, [~tiennm]. Good point. I didn't even

[jira] [Updated] (TIKA-1105) CompositeParser should use ParseContext when getting correct Parser

2014-07-16 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1105: -- Attachment: TIKA-1105.palsulich.071614.patch This really is a trivial update. I attached a

[jira] [Created] (TIKA-1370) CachedTranslator Implementation

2014-07-16 Thread Tyler Palsulich (JIRA)
Tyler Palsulich created TIKA-1370: - Summary: CachedTranslator Implementation Key: TIKA-1370 URL: https://issues.apache.org/jira/browse/TIKA-1370 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-1365) Incorrectly MimeType detection for Apache Lucene web site

2014-07-15 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062180#comment-14062180 ] Tyler Palsulich commented on TIKA-1365: --- Thanks! It looks like the html is malformed,

[jira] [Commented] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2014-07-15 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062188#comment-14062188 ] Tyler Palsulich commented on TIKA-1367: --- I think that letting users know just how big

[jira] [Commented] (TIKA-1365) Incorrectly MimeType detection for Apache Lucene web site

2014-07-14 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060655#comment-14060655 ] Tyler Palsulich commented on TIKA-1365: --- Hi [~tiennm]. Thanks for raising this issue.

[jira] [Updated] (TIKA-1327) New parser for Matlab .mat files

2014-07-09 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1327: -- Attachment: TIKA-1327.palsulich.070914.patch Here is a patch off of r1609222 (current trunk).

[jira] [Commented] (TIKA-1363) .mat files not parsing

2014-07-09 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056510#comment-14056510 ] Tyler Palsulich commented on TIKA-1363: --- I don't have Matlab on this computer. So,

[jira] [Commented] (TIKA-1363) .mat files not parsing

2014-07-09 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056605#comment-14056605 ] Tyler Palsulich commented on TIKA-1363: --- Great! The patch in TIKA-1327 will take care

[jira] [Updated] (TIKA-1327) New parser for Matlab .mat files

2014-07-09 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1327: -- Attachment: TIKA-1327.palsulich.070914.v2.patch test_mat_text.mat Updated the

<    1   2   3   4   5   6   >