[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633429#comment-14633429 ] Tim Allison commented on TIKA-1678: --- [~tilman], y, that's taken from the xmp. As you

[jira] [Commented] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633433#comment-14633433 ] Tim Allison commented on TIKA-1238: --- [~rangma], Any chance you could share a test file?

[jira] [Comment Edited] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633454#comment-14633454 ] Tim Allison edited comment on TIKA-1678 at 7/20/15 11:43 AM: -

[jira] [Commented] (TIKA-1690) Inconsistent (buggy) behavior when using tika-server

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633436#comment-14633436 ] Tim Allison commented on TIKA-1690: --- tmpFile? Do you mean the fileUrl? Sorry.

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633454#comment-14633454 ] Tim Allison commented on TIKA-1678: --- The good news is that with PDFBox 2.0, we get a

[jira] [Commented] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously

2015-07-20 Thread Magesh Tarala (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633473#comment-14633473 ] Magesh Tarala commented on TIKA-1238: - Hi Tim, The files have personal information and

[jira] [Commented] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633511#comment-14633511 ] Tim Allison commented on TIKA-1238: --- Got it. For now, let's see if I can find some

[jira] [Commented] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously

2015-07-20 Thread Magesh Tarala (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633651#comment-14633651 ] Magesh Tarala commented on TIKA-1238: - Tim - I've attached a file. Could you download

[jira] [Updated] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously

2015-07-20 Thread Magesh Tarala (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Magesh Tarala updated TIKA-1238: Attachment: 873911_100_20061124_191408.msg Update OutlookExtractor to handle codepage

[jira] [Commented] (TIKA-1690) Inconsistent (buggy) behavior when using tika-server

2015-07-20 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633606#comment-14633606 ] Chris A. Mattmann commented on TIKA-1690: - no, I mean TikaUtkls.getInputStream

[jira] [Commented] (TIKA-1690) Inconsistent (buggy) behavior when using tika-server

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633610#comment-14633610 ] Tim Allison commented on TIKA-1690: --- Is the problem {{is.available()}}? {noformat}

[jira] [Comment Edited] (TIKA-1690) Inconsistent (buggy) behavior when using tika-server

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633610#comment-14633610 ] Tim Allison edited comment on TIKA-1690 at 7/20/15 1:43 PM: Is

[jira] [Comment Edited] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

2015-07-20 Thread jayesh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633637#comment-14633637 ] jayesh edited comment on TIKA-1285 at 7/20/15 2:11 PM: --- Any idea

[jira] [Commented] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

2015-07-20 Thread jayesh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633637#comment-14633637 ] jayesh commented on TIKA-1285: -- Any idea guys, when we can accomodate PDFBox2.0 with tika?

Re: [jira] [Commented] (TIKA-1690) Inconsistent (buggy) behavior when using tika-server

2015-07-20 Thread Udai
Hi dev, I am part of this group by mistake. Kindly unsubscribe me as none of the emails make any sense to me :) Regards, Udai On 20 July 2015 at 19:07, Chris A. Mattmann (JIRA) j...@apache.org wrote: [

[jira] [Commented] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633643#comment-14633643 ] Tim Allison commented on TIKA-1285: --- Still hammering out some issues. If regression tests

[jira] [Updated] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1238: -- Attachment: (was: 873911_100_20061124_191408.msg) Update OutlookExtractor to handle codepage

[jira] [Commented] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

2015-07-20 Thread jayesh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633655#comment-14633655 ] jayesh commented on TIKA-1285: -- org.apache.fontbox.ttf.TrueTypeFont initializeTable SEVERE: An

[jira] [Issue Comment Deleted] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously

2015-07-20 Thread Magesh Tarala (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Magesh Tarala updated TIKA-1238: Comment: was deleted (was: Tim - I've attached a file. Could you download it and we can delete from

[jira] [Commented] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously

2015-07-20 Thread Magesh Tarala (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633780#comment-14633780 ] Magesh Tarala commented on TIKA-1238: - Tim - This fix will be in 1.10, right? When do

[jira] [Commented] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633696#comment-14633696 ] Tim Allison commented on TIKA-1238: --- Probably not the best way to transfer a file... I

[jira] [Commented] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633786#comment-14633786 ] Tim Allison commented on TIKA-1238: --- That's up to the community, but I think we have

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633722#comment-14633722 ] Tilman Hausherr commented on TIKA-1678: --- Yes, such a string check would be useful. Or

[jira] [Reopened] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-1238: --- Doh. Reopening until we get the mods to POI and then the updated Tika code after the next POI release.

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633687#comment-14633687 ] Tilman Hausherr commented on TIKA-1678: --- sure: {code} public class Tika1678 extends

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634065#comment-14634065 ] Tilman Hausherr commented on TIKA-1678: --- Yes please do and attach the file. It's late

[jira] [Comment Edited] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633986#comment-14633986 ] Tim Allison edited comment on TIKA-1678 at 7/20/15 7:38 PM:

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633986#comment-14633986 ] Tim Allison commented on TIKA-1678: --- That works perfectly. Thank you, [~tilman]! Now

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634045#comment-14634045 ] Tilman Hausherr commented on TIKA-1678: --- Likely a bug. I tried calling getTitele

[jira] [Comment Edited] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634045#comment-14634045 ] Tilman Hausherr edited comment on TIKA-1678 at 7/20/15 8:41 PM:

[jira] [Comment Edited] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633487#comment-14633487 ] Tim Allison edited comment on TIKA-1238 at 7/20/15 3:34 PM: The

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633720#comment-14633720 ] Tim Allison commented on TIKA-1678: --- Very helpful! If we require that the string start

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634391#comment-14634391 ] Tim Allison commented on TIKA-1678: --- Slight modification of [~tilman]'s example added in