Re: [jira] [Closed] (TIKA-993) Language Detection Fault

2015-03-03 Thread Oleg Tikhonov
The first found. In this case will be German. Expexted result - a topic to discuss. I would expect to get both detected languages. However it is beyond tika's lang.dect. Bottom line, so be it as is until Ken's implementation. On 3 Mar 2015 09:09, Tyler Palsulich tpalsul...@gmail.com wrote: Hi,

[jira] [Commented] (TIKA-1524) Can install Tika-Bundle, missing JUnit dependency

2015-03-03 Thread Pieter (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345055#comment-14345055 ] Pieter commented on TIKA-1524: -- The bundle does not intent to import junit, but still, it

[jira] [Comment Edited] (TIKA-456) Support timeouts for parsers

2015-03-03 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345071#comment-14345071 ] Luis Filipe Nassif edited comment on TIKA-456 at 3/3/15 1:40 PM:

[jira] [Commented] (TIKA-456) Support timeouts for parsers

2015-03-03 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345071#comment-14345071 ] Luis Filipe Nassif commented on TIKA-456: - I also agree with that. Support

[jira] [Resolved] (TIKA-1489) PDF Text extraction without permission

2015-03-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1489. --- Resolution: Fixed Fix Version/s: 1.8 r1663764 PDF Text extraction without permission

[jira] [Commented] (TIKA-1004) Support ansi as an alias for windows-1252 charset

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345675#comment-14345675 ] Tyler Palsulich commented on TIKA-1004: --- Does anyone have an ansi encoded file we can

[jira] [Commented] (TIKA-1004) Support ansi as an alias for windows-1252 charset

2015-03-03 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345687#comment-14345687 ] Konstantin Gribov commented on TIKA-1004: - -1 for this ticket. Windows references

[jira] [Commented] (TIKA-1017) DefaultHtmlMapper misses some safe elements

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345688#comment-14345688 ] Tyler Palsulich commented on TIKA-1017: --- I'm afraid I don't understand what a safe

[jira] [Closed] (TIKA-1008) If we add lig4j jar after tika-app jar in classpath it does not resolve LoggingEvent class getTimeStamp() method

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-1008. - Resolution: Won't Fix We've upgraded log4j since this point. If you still have this problem with

[jira] [Commented] (TIKA-1489) PDF Text extraction without permission

2015-03-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345815#comment-14345815 ] Hudson commented on TIKA-1489: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #527 (See

[jira] [Resolved] (TIKA-1000) secure-processing not supported by some JAXP implementations and causes mime type detection to fail

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1000. --- Resolution: Fixed Assignee: Tyler Palsulich Fixed in r1663779. Thank you!

[jira] [Closed] (TIKA-1004) Support ansi as an alias for windows-1252 charset

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-1004. - Resolution: Won't Fix Closing as Won't Fix per the above comment. Thanks! Support ansi as an

[jira] [Commented] (TIKA-1000) secure-processing not supported by some JAXP implementations and causes mime type detection to fail

2015-03-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345816#comment-14345816 ] Hudson commented on TIKA-1000: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #527 (See

[jira] [Commented] (TIKA-1046) Get java.util.zip.ZipException: unknown compression method when indexing ppf97-file containing wmf-image

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346113#comment-14346113 ] Tyler Palsulich commented on TIKA-1046: --- This issue is still happening in Tika

[jira] [Commented] (TIKA-1033) Tika doesn't parse embedded OLE Chart/Graph objects

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346051#comment-14346051 ] Tyler Palsulich commented on TIKA-1033: --- I'm able to reproduce this issue with Tika

[jira] [Commented] (TIKA-1045) Unsupported AutoCAD drawing version: AC1014

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346111#comment-14346111 ] Tyler Palsulich commented on TIKA-1045: --- Attachment still causes this exception on

[jira] [Updated] (TIKA-1045) Unsupported AutoCAD drawing version: AC1014

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1045: -- Labels: new-parser (was: ) Unsupported AutoCAD drawing version: AC1014

[jira] [Comment Edited] (TIKA-1039) Raw image file detected as audio/mpeg

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346093#comment-14346093 ] Tyler Palsulich edited comment on TIKA-1039 at 3/4/15 12:28 AM:

[jira] [Closed] (TIKA-1054) Problem with parsing excel date formats

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich closed TIKA-1054. - Resolution: Not a Problem Closing as Not a Problem, following the above Locale comments. I don't

[jira] [Commented] (TIKA-1038) Parsing PDF with StackOverlowError

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346075#comment-14346075 ] Tyler Palsulich commented on TIKA-1038: --- Just commented on PDFBOX-1835 with this

[jira] [Resolved] (TIKA-1066) tika-server ignoring port option

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1066. --- Resolution: Fixed tika-server confirmed to listen on the proper port. Closing as fixed.

[jira] [Commented] (TIKA-1020) Excel 2010 parser missing cell values are not reported resulting in missing columns values

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346025#comment-14346025 ] Tyler Palsulich commented on TIKA-1020: --- Have we changed our mind on this issue? Do

[jira] [Resolved] (TIKA-1040) Could not delete temporary file

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1040. --- Resolution: Fixed Marking as Fixed since we updated the underlying dependencies in the other

[jira] [Resolved] (TIKA-1029) Parser exception with the attached document

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1029. --- Resolution: Fixed Tested with Tika 1.8-SNAPSHOT and no exception was thrown. Closing as Fixed.

[jira] [Resolved] (TIKA-1037) No text extracted from Excel file (rus chars)

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1037. --- Resolution: Fixed Parsed with Tika 1.8-SNAPSHOT with no issues. Closing as fixed. No text

[jira] [Commented] (TIKA-1057) document content property Status is not extracted for *.doc files

2015-03-03 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346120#comment-14346120 ] Tyler Palsulich commented on TIKA-1057: --- Can someone provide a .doc file with a

[jira] [Commented] (TIKA-1007) Improve Concurrency of ParsingReader

2015-03-03 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346263#comment-14346263 ] Luis Filipe Nassif commented on TIKA-1007: -- I've checked the implementation of

[jira] [Commented] (TIKA-1039) Raw image file detected as audio/mpeg

2015-03-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346415#comment-14346415 ] Nick Burch commented on TIKA-1039: -- Without writing a dedicated detector, I'm not sure how

[jira] [Commented] (TIKA-891) Use POST in addition to PUT on method calls in tika-server

2015-03-03 Thread Sergey Beryozkin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344857#comment-14344857 ] Sergey Beryozkin commented on TIKA-891: --- Well, I guess we have to be careful with