[jira] [Commented] (TIKA-1153) Upgrade pdfbox to latest 1.8.2 version

2013-08-15 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741785#comment-13741785
 ] 

Tim Allison commented on TIKA-1153:
---

Committed as of r1514551.

 Upgrade pdfbox to latest 1.8.2 version
 --

 Key: TIKA-1153
 URL: https://issues.apache.org/jira/browse/TIKA-1153
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.4
 Environment: Windows/Linux
Reporter: Hong-Thai Nguyen
Priority: Critical
 Fix For: 1.5


 Current version is 1.8.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


permissions to close issue?

2013-08-15 Thread Allison, Timothy B.
All,

 I don't appear to have permissions to close out issues that I didn't open 
(TIKA-1001 and TIKA-1153).  Is this standard jira policy or user error?  Thank 
you.

Best,

Tim

-Original Message-
From: Tim Allison (JIRA) [mailto:j...@apache.org] 
Sent: Wednesday, August 14, 2013 10:05 PM
To: dev@tika.apache.org
Subject: [jira] [Commented] (TIKA-1001) tika no longer seems to honor HTTP meta 
tag for arabic text in ISO-8859-6 charset


[ 
https://issues.apache.org/jira/browse/TIKA-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740580#comment-13740580
 ] 

Tim Allison commented on TIKA-1001:
---

Fixed as of r1514126. Thank you for submitting this issue with test file!

 tika no longer seems to honor HTTP meta tag for arabic text in ISO-8859-6 
 charset
 -

 Key: TIKA-1001
 URL: https://issues.apache.org/jira/browse/TIKA-1001
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.2
Reporter: david lemon
 Attachments: badarabic.html, TIKA-1001v1.tar.gz


 attached document extracts correctly in Tika 1.1
 attached document extracts incorrectly in tika 1.2.
 The difference appears to be that tika 1.1 honors the http meta content-type 
 tag which specifies the charset as iso-8859-6, and correctly converts the 
 output to UTF-8.
 tika 1.2 appears to ignore the charset specified in the meta tag.
 Some noodling seems to indicate that the problem is the charset.
 it doesn't matter what mode tika is used in (server, app mode, etc. even if 
 content-type is specified with a charset, the output is still garbage).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-1153) Upgrade pdfbox to latest 1.8.2 version

2013-08-15 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741785#comment-13741785
 ] 

Tim Allison commented on TIKA-1153:
---

Committed as of r1514551.

 Upgrade pdfbox to latest 1.8.2 version
 --

 Key: TIKA-1153
 URL: https://issues.apache.org/jira/browse/TIKA-1153
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.4
 Environment: Windows/Linux
Reporter: Hong-Thai Nguyen
Priority: Critical
 Fix For: 1.5


 Current version is 1.8.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira