[jira] [Commented] (TIKA-1539) GRB file magic bytes and extension matching

2015-02-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310484#comment-14310484 ] Hudson commented on TIKA-1539: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #475 (See [https://b

[jira] [Updated] (TIKA-1541) StringsParser: a simple strings-based parser for Tika

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1541: Attachment: TIKA-1541.TotaroMattmann.020615.patch.txt - please find an integrated patch with

[jira] [Assigned] (TIKA-1541) StringsParser: a simple strings-based parser for Tika

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned TIKA-1541: --- Assignee: Chris A. Mattmann > StringsParser: a simple strings-based parser for Tika >

[jira] [Commented] (TIKA-1541) StringsParser: a simple strings-based parser for Tika

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310470#comment-14310470 ] Chris A. Mattmann commented on TIKA-1541: - Right [~lfcnassif] and on TIKA-1483, it'

[jira] [Commented] (TIKA-1536) Upgrade compiler definition in pom's to Java 7

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310458#comment-14310458 ] Chris A. Mattmann commented on TIKA-1536: - I agree with Nick. However this is so tr

[jira] [Commented] (TIKA-1343) Create a Tika Translator implementation that uses JoshuaDecoder

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310456#comment-14310456 ] Chris A. Mattmann commented on TIKA-1343: - Hi Lewis, the current status is the foll

[jira] [Commented] (TIKA-1331) Find/configure a vm and gather initial corpus

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310454#comment-14310454 ] Chris A. Mattmann commented on TIKA-1331: - I forgot to mention - I'd be happy to sh

[jira] [Commented] (TIKA-1331) Find/configure a vm and gather initial corpus

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310453#comment-14310453 ] Chris A. Mattmann commented on TIKA-1331: - Thanks Tim. One thing you may want to co

[jira] [Resolved] (TIKA-1539) GRB file magic bytes and extension matching

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-1539. - Resolution: Fixed Fix Version/s: 1.8 Thanks [~Lukeliush] appreciate it. I went ahead

[jira] [Assigned] (TIKA-1539) GRB file magic bytes and extension matching

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned TIKA-1539: --- Assignee: Chris A. Mattmann > GRB file magic bytes and extension matching > -

[jira] [Commented] (TIKA-1544) empty lines are not preserved

2015-02-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310027#comment-14310027 ] Tim Allison commented on TIKA-1544: --- Sounds good. As for unit tests passing, they do. W

[jira] [Commented] (TIKA-1544) empty lines are not preserved

2015-02-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310014#comment-14310014 ] Michael McCandless commented on TIKA-1544: -- bq. I have hesitation about changing t

[jira] [Comment Edited] (TIKA-1544) empty lines are not preserved

2015-02-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309977#comment-14309977 ] Tim Allison edited comment on TIKA-1544 at 2/6/15 9:45 PM: --- Y, I'

[jira] [Commented] (TIKA-1544) empty lines are not preserved

2015-02-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309977#comment-14309977 ] Tim Allison commented on TIKA-1544: --- Y, I'm not sure. If there are consecutive {{\par}}s

[jira] [Commented] (TIKA-1544) empty lines are not preserved

2015-02-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309956#comment-14309956 ] Michael McCandless commented on TIKA-1544: -- bq. Michael McCandless, is the fix thi

[jira] [Updated] (TIKA-1544) empty lines are not preserved

2015-02-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1544: -- Attachment: testRTFNewlines.rtf Will shorten this before committing, if the fix actually works. > empty

[jira] [Updated] (TIKA-1544) empty lines are not preserved

2015-02-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1544: -- Attachment: preserve_new_lines_in_rtf.patch [~mikemccand], is the fix this simple? Or am I missing unint

[jira] [Commented] (TIKA-936) encoding of ZipArchiveInputStream

2015-02-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309874#comment-14309874 ] Hudson commented on TIKA-936: - SUCCESS: Integrated in tika-trunk-jdk1.7 #474 (See [https://buil

[jira] [Commented] (TIKA-1542) Substitute Apache TTF test file for current non-Apache friendly file

2015-02-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309875#comment-14309875 ] Hudson commented on TIKA-1542: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #474 (See [https://b

[jira] [Commented] (TIKA-1542) Substitute Apache TTF test file for current non-Apache friendly file

2015-02-06 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309864#comment-14309864 ] John Hewson commented on TIKA-1542: --- That's a good choice. > Substitute Apache TTF test

[jira] [Resolved] (TIKA-1542) Substitute Apache TTF test file for current non-Apache friendly file

2015-02-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1542. --- Resolution: Fixed r1657952 Went with Google's Open Sans, which is ASF 2.0 according to Google's [site

[jira] [Reopened] (TIKA-1542) Substitute Apache TTF test file for current non-Apache friendly file

2015-02-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-1542: --- See discussion on PDFBOX-2383. Thank you to [~jahewson] and [~tilman] for digging a bit more into my initi

[jira] [Updated] (TIKA-936) encoding of ZipArchiveInputStream

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-936: --- Fix Version/s: 1.8 > encoding of ZipArchiveInputStream > - > >

[jira] [Resolved] (TIKA-936) encoding of ZipArchiveInputStream

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-936. Resolution: Fixed - Thanks for the patch in pull request #27, [~kongxianghe1234] I committed t

[GitHub] tika pull request: Update RarParser.java

2015-02-06 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/27 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled

[jira] [Commented] (TIKA-936) encoding of ZipArchiveInputStream

2015-02-06 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309736#comment-14309736 ] ASF GitHub Bot commented on TIKA-936: - Github user asfgit closed the pull request at:

[jira] [Assigned] (TIKA-936) encoding of ZipArchiveInputStream

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned TIKA-936: -- Assignee: Chris A. Mattmann (was: Jukka Zitting) > encoding of ZipArchiveInputStream > -

[jira] [Commented] (TIKA-1535) Inheritance modification for the class MIMETypes

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309614#comment-14309614 ] Chris A. Mattmann commented on TIKA-1535: - Note that MimeTypes implements Detector,

[jira] [Commented] (TIKA-1544) empty lines are not preserved

2015-02-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309621#comment-14309621 ] Tim Allison commented on TIKA-1544: --- Ah, ok. I'll take a look. I've been away from the

[jira] [Commented] (TIKA-1517) MIME type selection with probability

2015-02-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309618#comment-14309618 ] Chris A. Mattmann commented on TIKA-1517: - See my comments on TIKA-1535. > MIME t

[jira] [Commented] (TIKA-1544) empty lines are not preserved

2015-02-06 Thread mortee (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309568#comment-14309568 ] mortee commented on TIKA-1544: -- Unfortunately, I'm not familiar with the code at all. To be ho

[jira] [Commented] (TIKA-1543) TesseractOCRParser.setTesseractPath() doesn't work on Linux

2015-02-06 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309325#comment-14309325 ] Konstantin Gribov commented on TIKA-1543: - Is tesseract binary executable? Is it ca

[jira] [Commented] (TIKA-1544) empty lines are not preserved

2015-02-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309268#comment-14309268 ] Tim Allison commented on TIKA-1544: --- I'd be happy to review a patch. :) > empty lines ar

[jira] [Updated] (TIKA-1544) empty lines are not preserved

2015-02-06 Thread mortee (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mortee updated TIKA-1544: - Description: I'm trying to extract the text content from RTF documents. The files contain empty lines (two or more

[jira] [Created] (TIKA-1544) empty lines are not preserved

2015-02-06 Thread mortee (JIRA)
mortee created TIKA-1544: Summary: empty lines are not preserved Key: TIKA-1544 URL: https://issues.apache.org/jira/browse/TIKA-1544 Project: Tika Issue Type: Bug Affects Versions: 1.6 E

Re: [jira] [Created] (TIKA-1543) TesseractOCRParser.setTesseractPath() doesn't work on Linux

2015-02-06 Thread Oleg Tikhonov
Hi, Just one quess. Did you check the permissons, does it have executable permission? Br, Oleg On 6 Feb 2015 12:15, "Sean Zhao (JIRA)" wrote: > Sean Zhao created TIKA-1543: > --- > > Summary: TesseractOCRParser.setTesseractPath() doesn't work > on Linux >

[jira] [Created] (TIKA-1543) TesseractOCRParser.setTesseractPath() doesn't work on Linux

2015-02-06 Thread Sean Zhao (JIRA)
Sean Zhao created TIKA-1543: --- Summary: TesseractOCRParser.setTesseractPath() doesn't work on Linux Key: TIKA-1543 URL: https://issues.apache.org/jira/browse/TIKA-1543 Project: Tika Issue Type: Bug