[jira] [Comment Edited] (TIKA-2232) Add JBIG2 image parsing support

2017-01-13 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822253#comment-15822253
 ] 

Tim Allison edited comment on TIKA-2232 at 1/13/17 7:51 PM:


Proposed change if jbig2 is not on the classpath:

PDFParser extractInlineImages adds:
{noformat}
X-TIKA:EXCEPTION:warn : org.apache.pdfbox.filter.MissingImageReaderException: 
Cannot read JBIG2 image: jbig2-imageio is not installed
at org.apache.pdfbox.filter.Filter.findImageReader(Filter.java:128)
at org.apache.pdfbox.filter.JBIG2Filter.decode(JBIG2Filter.java:54)
{noformat}
to the metadata of the PDF...

ImageParser checks for JBIG2 in {{try{ Class.forName } ... }} before adding 
jbig2 to {{SUPPORTED_TYPES}}.  If jbig2 is not on the cp, then the files are 
handled by the EmptyParser, as they used to be.


was (Author: talli...@mitre.org):
Proposed change if jbig2 is not on the classpath:

PDFParser extractInlineImages adds:
{noformat}
X-TIKA:EXCEPTION:warn : org.apache.pdfbox.filter.MissingImageReaderException: 
Cannot read JBIG2 image: jbig2-imageio is not installed
at org.apache.pdfbox.filter.Filter.findImageReader(Filter.java:128)
at org.apache.pdfbox.filter.JBIG2Filter.decode(JBIG2Filter.java:54)
{noformat}
to the metadata of the PDF...

ImageParser checks for JBIG2 in {{try{ Class.forName}} before adding jp2 to 
{{SUPPORTED_TYPES}}.  If jbig2 is not on the cp, then the files are handled by 
the EmptyParser, as they used to be.

> Add JBIG2 image parsing support
> ---
>
> Key: TIKA-2232
> URL: https://issues.apache.org/jira/browse/TIKA-2232
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.14
> Environment: Any
>Reporter: Pascal Essiembre
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 2.0, 1.15
>
>
> If you are interested, I would like to add support for JBIG2 image files 
> (.jb2, or .jbig2).  I have encountered them PDFs.
> I will make a pull-request shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TIKA-2232) Add JBIG2 image parsing support

2017-01-13 Thread Nicholas DiPiazza (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822136#comment-15822136
 ] 

Nicholas DiPiazza edited comment on TIKA-2232 at 1/13/17 6:39 PM:
--

[~pascal.essiembre] totally

obviously with the GPL3 license most people cannot use this jbig2-imageio 
Library. So can we please provide a way to turn off this exception?

{code}
org.apache.pdfbox.filter.MissingImageReaderException: Cannot read JBIG2 image: 
jbig2-imageio is not installed
at org.apache.pdfbox.filter.Filter.findImageReader(Filter.java:128) 
~[pdfbox-2.0.1.jar:2.0.1]
at org.apache.pdfbox.filter.JBIG2Filter.decode(JBIG2Filter.java:55) 
~[pdfbox-2.0.1.jar:2.0.1]
at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69) 
~[pdfbox-2.0.1.jar:2.0.1]
at 
org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:163) 
~[pdfbox-2.0.1.jar:2.0.1]
at 
org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:235) 
~[pdfbox-2.0.1.jar:2.0.1]
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.(PDImageXObject.java:147)
 ~[pdfbox-2.0.1.jar:2.0.1]
at 
org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:70) 
~[pdfbox-2.0.1.jar:2.0.1]
at 
org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:385) 
~[pdfbox-2.0.1.jar:2.0.1] 
{code}


was (Author: nicholas.dipiazza):
[~pascal.essiembre] totally

obviously with the GPL3 license most people cannot use this jbig2-imageio 
Library. So can we please provide a way to turn off this exception?

{code}
org.apache.pdfbox.filter.MissingImageReaderException: Cannot read JBIG2 image: 
jbig2-imageio is not installed
at org.apache.pdfbox.filter.Filter.findImageReader(Filter.java:128) 
~[pdfbox-2.0.1.jar:2.0.1]
at org.apache.pdfbox.filter.JBIG2Filter.decode(JBIG2Filter.java:55) 
~[pdfbox-2.0.1.jar:2.0.1]
at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69) 
~[pdfbox-2.0.1.jar:2.0.1]
at 
org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:163) 
~[pdfbox-2.0.1.jar:2.0.1]
at 
org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:235) 
~[pdfbox-2.0.1.jar:2.0.1]
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.(PDImageXObject.java:147)
 ~[pdfbox-2.0.1.jar:2.0.1]
at 
org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:70) 
~[pdfbox-2.0.1.jar:2.0.1]
at 
org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:385) 
~[pdfbox-2.0.1.jar:2.0.1]
at 
org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:359) 
~[tika-parsers-1.13.jar:1.13]
at org.apache.tika.parser.pdf.PDF2XHTML.endPage(PDF2XHTML.java:271) 
~[tika-parsers-1.13.jar:1.13]
at 
org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:393) 
~[pdfbox-2.0.1.jar:2.0.1]
at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:214) 
~[tika-parsers-1.13.jar:1.13]
at 
org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) 
~[pdfbox-2.0.1.jar:2.0.1]
at 
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) 
~[pdfbox-2.0.1.jar:2.0.1]

{code}

> Add JBIG2 image parsing support
> ---
>
> Key: TIKA-2232
> URL: https://issues.apache.org/jira/browse/TIKA-2232
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 1.14
> Environment: Any
>Reporter: Pascal Essiembre
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 2.0, 1.15
>
>
> If you are interested, I would like to add support for JBIG2 image files 
> (.jb2, or .jbig2).  I have encountered them PDFs.
> I will make a pull-request shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)