[jira] [Updated] (TIKA-2632) Analyze unknown govdocs files

2018-04-19 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2632: Description: I recently started to analyze randomly govdocs1 files that could not be recognized by

[jira] [Updated] (TIKA-2632) Analyze unknown govdocs files

2018-04-19 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2632: Description: I recently started to analyze randomly govdocs1 files that could not be recognized by

[jira] [Updated] (TIKA-2632) Analyze unknown govdocs files

2018-04-19 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2632: Description: I recently started to analyze randomly govdocs1 files that could not be recognized by

[jira] [Comment Edited] (TIKA-2632) Analyze unknown govdocs files

2018-04-18 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442059#comment-16442059 ] Andreas Meier edited comment on TIKA-2632 at 4/18/18 7:46 AM: -- Thanks for the

[jira] [Commented] (TIKA-2632) Analyze unknown govdocs files

2018-04-18 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442059#comment-16442059 ] Andreas Meier commented on TIKA-2632: - Thanks for the link [~talli...@mitre.org] Glad to see you

[jira] [Updated] (TIKA-2632) Analyze unknown govdocs files

2018-04-18 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2632: Description: I recently started to analyze randomly govdocs1 files that could not be recognized by

[jira] [Updated] (TIKA-2632) Analyze unknown govdocs files

2018-04-18 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2632: Description: I recently started to analyze randomly govdocs1 files that could not be recognized by

[jira] [Updated] (TIKA-2632) Analyze unknown govdocs files

2018-04-18 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2632: Description: I recently started to analyze randomly govdocs1 files that could not be recognized by

[jira] [Updated] (TIKA-2632) Analyze unknown govdocs files

2018-04-16 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2632: Description: I recently started to analyze randomly govdocs1 files that could not be recognized by

[jira] [Updated] (TIKA-2632) Analyze unknown govdocs files

2018-04-16 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2632: Description: I recently started to analyze randomly govdocs1 files that could not be recognized by

[jira] [Updated] (TIKA-2632) Analyze unknown govdocs files

2018-04-13 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2632: Description: I recently started to analyze randomly govdocs1 files that could not be recognized by

[jira] [Created] (TIKA-2632) Analyze unknown govdocs files

2018-04-13 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2632: --- Summary: Analyze unknown govdocs files Key: TIKA-2632 URL: https://issues.apache.org/jira/browse/TIKA-2632 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-2629) Add image/x-dpx media-type detection

2018-04-11 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2629: --- Summary: Add image/x-dpx media-type detection Key: TIKA-2629 URL: https://issues.apache.org/jira/browse/TIKA-2629 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-2628) Add image/aces media-type detection

2018-04-11 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2628: --- Summary: Add image/aces media-type detection Key: TIKA-2628 URL: https://issues.apache.org/jira/browse/TIKA-2628 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-2619) Memory leak: PDF meta data detection fails with OutOfMemoryError

2018-03-29 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418926#comment-16418926 ] Andreas Meier commented on TIKA-2619: - Can confirm this OutOfMemoryError in Version 1.17 Tried to

[jira] [Commented] (TIKA-2609) Refine Emacs Lisp file recognition (.elc)

2018-03-22 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409192#comment-16409192 ] Andreas Meier commented on TIKA-2609: - Emacs 18 and earlier testfiles can be found under

[jira] [Commented] (TIKA-2611) Tika mistakenly determines mimetype of .js file as application/x-elc

2018-03-22 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409181#comment-16409181 ] Andreas Meier commented on TIKA-2611: - As [~gagravarr] already mentioned you should try to get the

[jira] [Created] (TIKA-2609) Refine Emacs Lisp file recognition (.elc)

2018-03-16 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2609: --- Summary: Refine Emacs Lisp file recognition (.elc) Key: TIKA-2609 URL: https://issues.apache.org/jira/browse/TIKA-2609 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-2574) Extend PCX detection in tika-mimetypes.xml

2018-03-15 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400138#comment-16400138 ] Andreas Meier commented on TIKA-2574: - Link to the original published specification taken from the IANA

[jira] [Comment Edited] (TIKA-2602) iCalendar not properly recognized as text/calendar

2018-03-15 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400071#comment-16400071 ] Andreas Meier edited comment on TIKA-2602 at 3/15/18 8:28 AM: -- Unfortunately

[jira] [Updated] (TIKA-2602) iCalendar not properly recognized as text/calendar

2018-03-15 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2602: Attachment: VERSION_Test > iCalendar not properly recognized as text/calendar >

[jira] [Commented] (TIKA-2602) iCalendar not properly recognized as text/calendar

2018-03-15 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400071#comment-16400071 ] Andreas Meier commented on TIKA-2602: - Unfortunately the above mentioned mime-type broke the

[jira] [Commented] (TIKA-2607) Exchange levigo-jbig2-imageio with pdfbox-jbig2-imageio:3.0.0

2018-03-14 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398247#comment-16398247 ] Andreas Meier commented on TIKA-2607: - [~talli...@mitre.org] I hope you don't mind that I created the

[jira] [Created] (TIKA-2607) Exchange levigo-jbig2-imageio with pdfbox-jbig2-imageio:3.0.0

2018-03-14 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2607: --- Summary: Exchange levigo-jbig2-imageio with pdfbox-jbig2-imageio:3.0.0 Key: TIKA-2607 URL: https://issues.apache.org/jira/browse/TIKA-2607 Project: Tika

[jira] [Commented] (TIKA-2602) iCalendar not properly recognized as text/calendar

2018-03-08 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391054#comment-16391054 ] Andreas Meier commented on TIKA-2602: - The following mime-type will recognize all testfiles correctly:

[jira] [Commented] (TIKA-2602) iCalendar not properly recognized as text/calendar

2018-03-08 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390988#comment-16390988 ] Andreas Meier commented on TIKA-2602: - Thanks for the response, Nick. On my search for more examples I

[jira] [Created] (TIKA-2602) iCalendar not properly recognized as text/calendar

2018-03-08 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2602: --- Summary: iCalendar not properly recognized as text/calendar Key: TIKA-2602 URL: https://issues.apache.org/jira/browse/TIKA-2602 Project: Tika Issue Type:

[jira] [Commented] (TIKA-2576) Add application/zstd detection and parser

2018-03-06 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389180#comment-16389180 ] Andreas Meier commented on TIKA-2576: - I'm glad I could help. > Add application/zstd detection and

[jira] [Comment Edited] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-03-05 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386017#comment-16386017 ] Andreas Meier edited comment on TIKA-2592 at 3/5/18 12:46 PM: -- Thanks Tim, but

[jira] [Comment Edited] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-03-05 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386017#comment-16386017 ] Andreas Meier edited comment on TIKA-2592 at 3/5/18 12:44 PM: -- Thanks Tim, but

[jira] [Commented] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-03-05 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386017#comment-16386017 ] Andreas Meier commented on TIKA-2592: - Thanks Tim, but I think I will just download the govdocs1 and

[jira] [Updated] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-03-05 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2592: Attachment: StandardCharsets_unsupported_by_IANA.txt > HTML with charset unicode handled as utf-16

[jira] [Comment Edited] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-03-02 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383350#comment-16383350 ] Andreas Meier edited comment on TIKA-2592 at 3/2/18 10:56 AM: -- {quote}Before

[jira] [Updated] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-03-02 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2592: Attachment: TestHTMLCharsetCP1256.html TestHTMLCharsetArabicCP1256.html > HTML with

[jira] [Commented] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-03-02 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383350#comment-16383350 ] Andreas Meier commented on TIKA-2592: - {quote} Before making this kind of change (default "unicode" to

[jira] [Commented] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-03-01 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381667#comment-16381667 ] Andreas Meier commented on TIKA-2592: - Thanks for your response [~kkrugler] You are right, "unicode"

[jira] [Created] (TIKA-2594) Mail detected as application/xhtml+xml

2018-02-28 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2594: --- Summary: Mail detected as application/xhtml+xml Key: TIKA-2594 URL: https://issues.apache.org/jira/browse/TIKA-2594 Project: Tika Issue Type: Bug Affects

[jira] [Commented] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-02-28 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380106#comment-16380106 ] Andreas Meier commented on TIKA-2592: - Attached a sample patch to set UTF-8 as default for "unicode"

[jira] [Updated] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-02-28 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2592: Attachment: fix-for-TIKA2592-contributed-by-Andreas-Meier.patch > HTML with charset unicode handled

[jira] [Created] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-02-28 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2592: --- Summary: HTML with charset unicode handled as utf-16 instead utf-8 Key: TIKA-2592 URL: https://issues.apache.org/jira/browse/TIKA-2592 Project: Tika Issue

[jira] [Created] (TIKA-2587) DKIM signed mails recognized as text/plain

2018-02-23 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2587: --- Summary: DKIM signed mails recognized as text/plain Key: TIKA-2587 URL: https://issues.apache.org/jira/browse/TIKA-2587 Project: Tika Issue Type: Bug

[jira] [Updated] (TIKA-2578) Mails not recognized when unknown X-headers are present

2018-02-20 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2578: Component/s: detector > Mails not recognized when unknown X-headers are present >

[jira] [Created] (TIKA-2578) Mails not recognized when unknown X-headers are present

2018-02-20 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2578: --- Summary: Mails not recognized when unknown X-headers are present Key: TIKA-2578 URL: https://issues.apache.org/jira/browse/TIKA-2578 Project: Tika Issue Type:

[jira] [Created] (TIKA-2576) Add application/zstd detection and parser

2018-02-14 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2576: --- Summary: Add application/zstd detection and parser Key: TIKA-2576 URL: https://issues.apache.org/jira/browse/TIKA-2576 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-2574) Extend PCX detection in tika-mimetypes.xml

2018-02-12 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2574: --- Summary: Extend PCX detection in tika-mimetypes.xml Key: TIKA-2574 URL: https://issues.apache.org/jira/browse/TIKA-2574 Project: Tika Issue Type: Sub-task

[jira] [Created] (TIKA-2557) .mbox detected as text/html

2018-01-26 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2557: --- Summary: .mbox detected as text/html Key: TIKA-2557 URL: https://issues.apache.org/jira/browse/TIKA-2557 Project: Tika Issue Type: Bug Components:

[jira] [Commented] (TIKA-2527) Typos in tika-mimetypes.xml

2018-01-26 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340729#comment-16340729 ] Andreas Meier commented on TIKA-2527: - Added a patch

[jira] [Updated] (TIKA-2527) Typos in tika-mimetypes.xml

2018-01-26 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2527: Attachment: fix-for-binhexmatch-TIKA2527-contributed-by-AMeier.patch > Typos in tika-mimetypes.xml >

[jira] [Commented] (TIKA-2527) Typos in tika-mimetypes.xml

2018-01-24 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337643#comment-16337643 ] Andreas Meier commented on TIKA-2527: - Added another patch

[jira] [Updated] (TIKA-2527) Typos in tika-mimetypes.xml

2018-01-24 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2527: Attachment: enhancement-for-TIKA2527-contributed-by-AMeier.patch > Typos in tika-mimetypes.xml >

[jira] [Commented] (TIKA-2527) Typos in tika-mimetypes.xml

2018-01-24 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337531#comment-16337531 ] Andreas Meier commented on TIKA-2527: - I attached a patch to address the mentioned problems.  

[jira] [Updated] (TIKA-2527) Typos in tika-mimetypes.xml

2018-01-24 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2527: Attachment: fix-for-TIKA2527-contributed-by-AMeier-Fixed-adpcmmi.patch > Typos in tika-mimetypes.xml

[jira] [Updated] (TIKA-2527) Typos in tika-mimetypes.xml

2018-01-24 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2527: Flags: Patch Affects Version/s: 1.18 2.0 > Typos in

[jira] [Updated] (TIKA-2527) Typos in tika-mimetypes.xml

2017-12-28 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2527: Affects Version/s: 1.17 > Typos in tika-mimetypes.xml > --- > >

[jira] [Commented] (TIKA-2527) Typos in tika-mimetypes.xml

2017-12-28 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306071#comment-16306071 ] Andreas Meier commented on TIKA-2527: - I don't know whether I shall open another ticket or not so I

[jira] [Commented] (TIKA-2527) Typos in tika-mimetypes.xml

2017-12-27 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304468#comment-16304468 ] Andreas Meier commented on TIKA-2527: - Found another suspect: {code:xml} ESRI Shapefiles

[jira] [Updated] (TIKA-2527) Typos in tika-mimetypes.xml

2017-12-14 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2527: Description: Are these mimetypes in tika-mimetypes.xml audio/x-adbcm instead audio/x-adpcm

[jira] [Updated] (TIKA-2527) Typos in tika-mimetypes.xml

2017-12-14 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2527: Description: Are these typos in tika-mimetypes.xml audio/x-dec-adbcm instead audio/x-dec-adpcm

[jira] [Updated] (TIKA-2527) Typos in tika-mimetypes.xml

2017-12-14 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2527: Description: Are these mimetypes in tika-mimetypes.xml audio/x-dec-adbcm instead

[jira] [Created] (TIKA-2527) Typos in tika-mimetypes.xml

2017-12-14 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2527: --- Summary: Typos in tika-mimetypes.xml Key: TIKA-2527 URL: https://issues.apache.org/jira/browse/TIKA-2527 Project: Tika Issue Type: Bug Components:

[jira] [Comment Edited] (TIKA-2484) Improve CharsetDetector to recognize UTF-16LE/BE,UTF-32LE/BE and UTF-7 with/without BOMs correctly

2017-11-07 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241633#comment-16241633 ] Andreas Meier edited comment on TIKA-2484 at 11/7/17 8:03 AM: -- Thanks for the

[jira] [Commented] (TIKA-2484) Improve CharsetDetector to recognize UTF-16LE/BE,UTF-32LE/BE and UTF-7 with/without BOMs correctly

2017-11-06 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241633#comment-16241633 ] Andreas Meier commented on TIKA-2484: - Thanks for the info [~gagravarr] I think I understand the basic

[jira] [Commented] (TIKA-2484) Improve CharsetDetector to recognize UTF-16LE/BE,UTF-32LE/BE and UTF-7 with/without BOMs correctly

2017-11-06 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240338#comment-16240338 ] Andreas Meier commented on TIKA-2484: - Would be great if you could try to get the CharsetDetector

[jira] [Updated] (TIKA-2484) Improve CharsetDetector to recognize UTF-16LE/BE,UTF-32LE/BE and UTF-7 with/without BOMs correctly

2017-10-30 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2484: Description: I would like to help to improve the recognition accuracy of the CharsetDetector.

[jira] [Updated] (TIKA-2484) Improve CharsetDetector to recognize UTF-16LE/BE,UTF-32LE/BE and UTF-7 with/without BOMs correctly

2017-10-27 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2484: Attachment: IUC10-ar.UTF-7.with-BOM IUC10-ar.UTF-7.without-BOM

[jira] [Created] (TIKA-2484) Improve CharsetDetector to recognize UTF-16LE/BE,UTF-32LE/BE and UTF-7 with/without BOMs correctly

2017-10-27 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2484: --- Summary: Improve CharsetDetector to recognize UTF-16LE/BE,UTF-32LE/BE and UTF-7 with/without BOMs correctly Key: TIKA-2484 URL: https://issues.apache.org/jira/browse/TIKA-2484