[jira] [Comment Edited] (PDFBOX-2041) Convert PDF to Image (Strange Color)

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979239#comment-13979239
 ] 

Tilman Hausherr edited comment on PDFBOX-2041 at 4/24/14 10:57 PM:
---

I didn't mean to remove %%EOF, just everything after it.

Could it be your Ubuntu disk is full?

If not - I don't have Ubuntu, so someone else will have to answer that.

Also try using the PDFBox app:

java -jar pdfbox-app-1.8.4-SNAPSHOT.jar PDFReader yourfile.pdf

does it display correctly or not? 

And what java version are you using? If it isn't the latest, what happens if 
you update to Java SE 7 Update 55  ?


was (Author: tilman):
I didn't mean to remove %%EOF, just everything after it.

Could it be your Ubuntu disk is full?

If not - I don't have Ubuntu, so someone else will have to answer that.

Also try using the PDFBox app:

java -jar pdfbox-app-1.8.4-SNAPSHOT.jar PDFReader yourfile.pdf

does it display correctly or not?

> Convert PDF to Image (Strange Color)
> 
>
> Key: PDFBOX-2041
> URL: https://issues.apache.org/jira/browse/PDFBOX-2041
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4
> Environment: Java(1.7.0_45),   OS (Ubuntu) 
>Reporter: ahfei
> Attachments: PDFBOX-2041.pdf, PDFBOX-2041.pdf-1-bad.tif, 
> pdfbox-2041.pdf-1-good.png
>
>
> Using PDFBox, tried to convert PDF to Image file  (case1.pdf, case1.jpg)
> Below is code i'm using : 
> BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200);   
>  
> ImageIOUtil.writeImage(image, "jpg", imagePath, BufferedImage.TYPE_INT_RGB, 
> 200);
> After convert, this image isn't look like pdf. Half page of it become blue 
> and black color. 
> Attached images & PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2042.
-

Resolution: Fixed

Thanks for the feedback. Btw, PDLab has the same problem, I'll create an issue 
soon.

> ColorSpace with empty Range array
> -
>
> Key: PDFBOX-2042
> URL: https://issues.apache.org/jira/browse/PDFBOX-2042
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4, 1.8.5, 2.0.0
>Reporter: Juraj Lonc
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf
>
>
> I have PDF document where I am modifying PDPage content stream.
> Saved document is invalid (Adobe reader complains about it).
> I have narrowed it down to ColorSpace. 
> Original document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> >>]>>
> Modified document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> /Range []
> >>]>>
> When I manually remove "/Range []" from PDF then Adobe reader opens it 
> without an error.
> Obviously that range is added by calling PDICCBased.getRangeArray(0) 
> somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2044) TrueType glyphs not displayed in rendering

2014-04-24 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2044.
-

   Resolution: Fixed
Fix Version/s: 2.0.0

Fixed in rev 1589893 for the trunk.

> TrueType glyphs not displayed in rendering
> --
>
> Key: PDFBOX-2044
> URL: https://issues.apache.org/jira/browse/PDFBOX-2044
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
>
> In the file of PDFBOX-2041 the text isn't displayed in the 2.0 version. It 
> works in the 1.8 version which uses awt.
> The cause is related to the truetype 'loca' table:
> https://developer.apple.com/fonts/TTRefMan/RM06/Chap6loca.html
> In the table of that file, the endOfGlyphs variable, which is the last offset 
> value ("extra" in the spec), is 0. Therefore, GlyphTable.initData() doesn't 
> read any glyphs because it believes that the end has already been reached, 
> because the first offset is (of course) 0 and is identical to the endOfGlyphs 
> variable.
> I will fix this by disregarding endOfGlyphs == offset if endOfGlyphs is 0, 
> and not asking for offset equality to skip glyphs, instead I require that the 
> next offset is bigger.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980165#comment-13980165
 ] 

Juraj Lonc commented on PDFBOX-2042:


Thanks for fix ;)

> ColorSpace with empty Range array
> -
>
> Key: PDFBOX-2042
> URL: https://issues.apache.org/jira/browse/PDFBOX-2042
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4, 1.8.5, 2.0.0
>Reporter: Juraj Lonc
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf
>
>
> I have PDF document where I am modifying PDPage content stream.
> Saved document is invalid (Adobe reader complains about it).
> I have narrowed it down to ColorSpace. 
> Original document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> >>]>>
> Modified document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> /Range []
> >>]>>
> When I manually remove "/Range []" from PDF then Adobe reader opens it 
> without an error.
> Obviously that range is added by calling PDICCBased.getRangeArray(0) 
> somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980141#comment-13980141
 ] 

Andreas Lehmkühler commented on PDFBOX-2042:


[~tilman] Thanks for the prompt action

> ColorSpace with empty Range array
> -
>
> Key: PDFBOX-2042
> URL: https://issues.apache.org/jira/browse/PDFBOX-2042
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4, 1.8.5, 2.0.0
>Reporter: Juraj Lonc
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf
>
>
> I have PDF document where I am modifying PDPage content stream.
> Saved document is invalid (Adobe reader complains about it).
> I have narrowed it down to ColorSpace. 
> Original document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> >>]>>
> Modified document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> /Range []
> >>]>>
> When I manually remove "/Range []" from PDF then Adobe reader opens it 
> without an error.
> Obviously that range is added by calling PDICCBased.getRangeArray(0) 
> somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-1069) Ubuntu throws exceptions when fonts missing

2014-04-24 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-1069.
-

   Resolution: Later
Fix Version/s: 2.0.0
   1.8.5

> Ubuntu throws exceptions when fonts missing
> ---
>
> Key: PDFBOX-1069
> URL: https://issues.apache.org/jira/browse/PDFBOX-1069
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.6.0
> Environment: Ubuntu 10.10
>Reporter: Sarah Kelley
> Fix For: 1.8.5, 2.0.0
>
> Attachments: sakelley_pdf_rendering_problem.zip
>
>
> On a plain vanilla Ubuntu 10.10 install, running
> run-all failed to render any text, and threw lots of exceptions:
> 
> 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getawtFont(PDTrueTypeFont.java:425)
> 
> ...however, installing the package "ttf-mscorefonts-installer"
> made those exceptions go away.
> (ubuntu1010_output.txt shows the exceptions; ubuntu1010_try2_output.txt is a 
> run after the extra fonts are installed)
> 
> Might be able to fix this one by setting UNKNOWN_FONT in
> Resources/PDFBox_External_Fonts.properties, but it would seem like
> it should choose some reasonable default if it isn't set...
> shouldn't it?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2044) TrueType glyphs not displayed in rendering

2014-04-24 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2044:


Description: 
In the file of PDFBOX-2041 the text isn't displayed in the 2.0 version. It 
works in the 1.8 version which uses awt.

The cause is related to the truetype 'loca' table:
https://developer.apple.com/fonts/TTRefMan/RM06/Chap6loca.html

In the table of that file, the endOfGlyphs variable, which is the last offset 
value ("extra" in the spec), is 0. Therefore, GlyphTable.initData() doesn't 
read any glyphs because it believes that the end has already been reached, 
because the first offset is (of course) 0 and is identical to the endOfGlyphs 
variable.

I will fix this by disregarding endOfGlyphs == offset if endOfGlyphs is 0, and 
not asking for offset equality to skip glyphs, instead I require that the next 
offset is bigger.

  was:
In the file of PDFBOX-2041 the text isn't displayed in the 2.0 version.

This is related to the truetype 'loca' table:
https://developer.apple.com/fonts/TTRefMan/RM06/Chap6loca.html

In the table of that file, the endOfGlyphs variable, which is the last offset 
value ("extra" in the spec), is 0. Therefore, GlyphTable.initData() doesn't 
read any glyphs because it believes that the end has already been reached, 
because the first offset is (of course) 0 and is identical to the endOfGlyphs 
variable.

I will fix this by disregarding endOfGlyphs == offset if endOfGlyphs is 0, and 
not asking for offset equality to skip glyphs, instead I require that the next 
offset is bigger.


> TrueType glyphs not displayed in rendering
> --
>
> Key: PDFBOX-2044
> URL: https://issues.apache.org/jira/browse/PDFBOX-2044
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>
> In the file of PDFBOX-2041 the text isn't displayed in the 2.0 version. It 
> works in the 1.8 version which uses awt.
> The cause is related to the truetype 'loca' table:
> https://developer.apple.com/fonts/TTRefMan/RM06/Chap6loca.html
> In the table of that file, the endOfGlyphs variable, which is the last offset 
> value ("extra" in the spec), is 0. Therefore, GlyphTable.initData() doesn't 
> read any glyphs because it believes that the end has already been reached, 
> because the first offset is (of course) 0 and is identical to the endOfGlyphs 
> variable.
> I will fix this by disregarding endOfGlyphs == offset if endOfGlyphs is 0, 
> and not asking for offset equality to skip glyphs, instead I require that the 
> next offset is bigger.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2044) TrueType glyphs not displayed in rendering

2014-04-24 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created PDFBOX-2044:
---

 Summary: TrueType glyphs not displayed in rendering
 Key: PDFBOX-2044
 URL: https://issues.apache.org/jira/browse/PDFBOX-2044
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr


In the file of PDFBOX-2041 the text isn't displayed in the 2.0 version.

This is related to the truetype 'loca' table:
https://developer.apple.com/fonts/TTRefMan/RM06/Chap6loca.html

In the table of that file, the endOfGlyphs variable, which is the last offset 
value ("extra" in the spec), is 0. Therefore, GlyphTable.initData() doesn't 
read any glyphs because it believes that the end has already been reached, 
because the first offset is (of course) 0 and is identical to the endOfGlyphs 
variable.

I will fix this by disregarding endOfGlyphs == offset if endOfGlyphs is 0, and 
not asking for offset equality to skip glyphs, instead I require that the next 
offset is bigger.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2041) Convert PDF to Image (Strange Color)

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978953#comment-13978953
 ] 

Tilman Hausherr edited comment on PDFBOX-2041 at 4/24/14 6:48 PM:
--

1. The PDF file is corrupt. A look at it with NOTEPAD++ shows %%EOF and then 
trash characters. Deleting all after that one makes the file much smaller, 
518KB instead of 4,85MB. How did you get that file?!
2. I am able to render it. Your jpg file looks like it was cut off at some time.
3. The 2.0 version isn't able to open it with the non sequential parser, the 
sequential parser can open it.
4. The 1.8 version renders it fine, the 2.0 version has many glyphs missing, 
maybe a duplicate of PDFBOX-2037. I was able to render it with a modified 2.0 
version that I use for myself. I will handle that problem in PDFBOX-2044.


was (Author: tilman):
1. The PDF file is corrupt. A look at it with NOTEPAD++ shows %%EOF and then 
trash characters. Deleting all after that one makes the file much smaller, 
518KB instead of 4,85MB. How did you get that file?!
2. I am able to render it. Your jpg file looks like it was cut off at some time.
3. The 2.0 version isn't able to open it with the non sequential parser, the 
sequential parser can open it.
4. The 1.8 version renders it fine, the 2.0 version has many glyphs missing, 
maybe a duplicate of PDFBOX-2037. I was able to render it with a modified 2.0 
version that I use for myself.

> Convert PDF to Image (Strange Color)
> 
>
> Key: PDFBOX-2041
> URL: https://issues.apache.org/jira/browse/PDFBOX-2041
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4
> Environment: Java(1.7.0_45),   OS (Ubuntu) 
>Reporter: ahfei
> Attachments: PDFBOX-2041.pdf, PDFBOX-2041.pdf-1-bad.tif, 
> pdfbox-2041.pdf-1-good.png
>
>
> Using PDFBox, tried to convert PDF to Image file  (case1.pdf, case1.jpg)
> Below is code i'm using : 
> BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200);   
>  
> ImageIOUtil.writeImage(image, "jpg", imagePath, BufferedImage.TYPE_INT_RGB, 
> 200);
> After convert, this image isn't look like pdf. Half page of it become blue 
> and black color. 
> Attached images & PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980070#comment-13980070
 ] 

Tilman Hausherr edited comment on PDFBOX-2042 at 4/24/14 6:37 PM:
--

Done in rev 1589827 for the trunk and rev 1589828 for the 1.8 branch. I dropped 
the idea of creating a correct size array (getNumberOfComponents() * 2) for a 
write, because the (deprecated) call setNumberOfComponents() allows dynamic 
change of the component count.

And [~chupacabras] gets a saved PDF that does not have an unneeded default 
range as in the previous fix.


was (Author: tilman):
Done in rev 1589827 for the trunk and rev 1589828 for the 1.8 branch. I dropped 
the idea of creating a correct size array (getNumberOfComponents() * 2) for a 
write, because the (deprecated) call setNumberOfComponents() allows dynamic 
change of the component count.

> ColorSpace with empty Range array
> -
>
> Key: PDFBOX-2042
> URL: https://issues.apache.org/jira/browse/PDFBOX-2042
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4, 1.8.5, 2.0.0
>Reporter: Juraj Lonc
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf
>
>
> I have PDF document where I am modifying PDPage content stream.
> Saved document is invalid (Adobe reader complains about it).
> I have narrowed it down to ColorSpace. 
> Original document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> >>]>>
> Modified document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> /Range []
> >>]>>
> When I manually remove "/Range []" from PDF then Adobe reader opens it 
> without an error.
> Obviously that range is added by calling PDICCBased.getRangeArray(0) 
> somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980070#comment-13980070
 ] 

Tilman Hausherr commented on PDFBOX-2042:
-

Done in rev 1589827 for the trunk and rev 1589828 for the 1.8 branch. I dropped 
the idea of creating a correct size array (getNumberOfComponents() * 2) for a 
write, because the (deprecated) call setNumberOfComponents() allows dynamic 
change of the component count.

> ColorSpace with empty Range array
> -
>
> Key: PDFBOX-2042
> URL: https://issues.apache.org/jira/browse/PDFBOX-2042
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4, 1.8.5, 2.0.0
>Reporter: Juraj Lonc
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf
>
>
> I have PDF document where I am modifying PDPage content stream.
> Saved document is invalid (Adobe reader complains about it).
> I have narrowed it down to ColorSpace. 
> Original document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> >>]>>
> Modified document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> /Range []
> >>]>>
> When I manually remove "/Range []" from PDF then Adobe reader opens it 
> without an error.
> Obviously that range is added by calling PDICCBased.getRangeArray(0) 
> somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979853#comment-13979853
 ] 

Andreas Lehmkühler commented on PDFBOX-2042:


[~tilman] There is another more important bug to be fixed. A read operation 
must not alter the pdf, saying that, please remove the setItem() call.

> ColorSpace with empty Range array
> -
>
> Key: PDFBOX-2042
> URL: https://issues.apache.org/jira/browse/PDFBOX-2042
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4, 1.8.5, 2.0.0
>Reporter: Juraj Lonc
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf
>
>
> I have PDF document where I am modifying PDPage content stream.
> Saved document is invalid (Adobe reader complains about it).
> I have narrowed it down to ColorSpace. 
> Original document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> >>]>>
> Modified document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> /Range []
> >>]>>
> When I manually remove "/Range []" from PDF then Adobe reader opens it 
> without an error.
> Obviously that range is added by calling PDICCBased.getRangeArray(0) 
> somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2043) While Reading a PDF which contains Image the Content of the PDF is misaligned in the resulting text.

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979846#comment-13979846
 ] 

Tilman Hausherr commented on PDFBOX-2043:
-

Please attach a PDF file, a tell what you would expect, and what you got, and 
which PDFBox version you used. And what exactly is the role of Visual Studio 
2005 there? Are you using a .net version of PDFBox?

> While Reading a PDF which contains Image the Content of the PDF is misaligned 
> in the resulting text.
> 
>
> Key: PDFBOX-2043
> URL: https://issues.apache.org/jira/browse/PDFBOX-2043
> Project: PDFBox
>  Issue Type: Bug
> Environment: Visual Studio 2005
>Reporter: Venkatesan
>
> We are trying to read content of a PDF file, The PDF has images in the 
> header. We use the PDFTextStripper.getText() method. After calling this 
> method the resulting text is misaligned compare to the Original PDF.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2043) While Reading a PDF which contains Image the Content of the PDF is misaligned in the resulting text.

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979846#comment-13979846
 ] 

Tilman Hausherr edited comment on PDFBOX-2043 at 4/24/14 3:36 PM:
--

Please attach a PDF file, tell what you expected, and what you got instead, and 
which PDFBox version you used. And what exactly is the role of Visual Studio 
2005 there? Are you using a .net version of PDFBox?


was (Author: tilman):
Please attach a PDF file, a tell what you would expect, and what you got, and 
which PDFBox version you used. And what exactly is the role of Visual Studio 
2005 there? Are you using a .net version of PDFBox?

> While Reading a PDF which contains Image the Content of the PDF is misaligned 
> in the resulting text.
> 
>
> Key: PDFBOX-2043
> URL: https://issues.apache.org/jira/browse/PDFBOX-2043
> Project: PDFBox
>  Issue Type: Bug
> Environment: Visual Studio 2005
>Reporter: Venkatesan
>
> We are trying to read content of a PDF file, The PDF has images in the 
> header. We use the PDFTextStripper.getText() method. After calling this 
> method the resulting text is misaligned compare to the Original PDF.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979838#comment-13979838
 ] 

Tilman Hausherr commented on PDFBOX-2042:
-

Thanks for the test.

Ouch!!! "private COSArray getRangeArray(int n)" does something sometimes _and_ 
returns something, and has two bugs. What I think was intended was to extend 
the range array with default values when needed. However the default values are 
the wrong ones (should be 0 1 according to the spec, not -100 100, this is for 
LAB), and the array extension isn't done because of an off-by-one mistake. I 
have committed a fix in rev 1589767 for the trunk and rev 1589769 for the 1.8 
branch.

Before:
{code}
private COSArray getRangeArray(int n)
{
COSArray rangeArray = 
(COSArray)stream.getStream().getDictionaryObject(COSName.RANGE);
if(rangeArray == null)
{
rangeArray = new COSArray();
stream.getStream().setItem(COSName.RANGE, rangeArray);
while(rangeArray.size() < n*2)
{
rangeArray.add(new COSFloat(-100));
rangeArray.add(new COSFloat(100));
}
}
return rangeArray;
}
{code}

After:

{code}
/**
 * Get the range array, create and fill it with default values (0, 1) if
 * needed so that it has enough value pairs for the position.
 *
 * @param pos The zero-based position that should exist after this call is
 * completed.
 * @return A valid range array.
 */
private COSArray getRangeArray(int pos)
{
//TODO per "clean code", a method should either 
// return something or modify something, but not both.
COSArray rangeArray = 
(COSArray)stream.getStream().getDictionaryObject(COSName.RANGE);
if(rangeArray == null)
{
rangeArray = new COSArray();
stream.getStream().setItem(COSName.RANGE, rangeArray);
}
// extend range array with default values if needed
while (rangeArray.size() < (pos + 1) * 2)
{
rangeArray.add(new COSFloat(0));
rangeArray.add(new COSFloat(1));
}
return rangeArray;
}
{code}

I will try to create a better fix later this week that returns default values 
if the array doesn't exist or is too small, and creates a correctly sized array 
for writing operations. This will have the advantage that PDF files don't get 
longer, i.e. don't have unneeded default range arrays. (This fix creates a 
default range array)

Btw this bug also resulted in an exception in TestExtractText.

The fixed libs will appear within a few hours here:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.5-SNAPSHOT/

> ColorSpace with empty Range array
> -
>
> Key: PDFBOX-2042
> URL: https://issues.apache.org/jira/browse/PDFBOX-2042
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4, 1.8.5, 2.0.0
>Reporter: Juraj Lonc
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf
>
>
> I have PDF document where I am modifying PDPage content stream.
> Saved document is invalid (Adobe reader complains about it).
> I have narrowed it down to ColorSpace. 
> Original document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> >>]>>
> Modified document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> /Range []
> >>]>>
> When I manually remove "/Range []" from PDF then Adobe reader opens it 
> without an error.
> Obviously that range is added by calling PDICCBased.getRangeArray(0) 
> somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2042:


Affects Version/s: 1.8.5
   1.8.4
Fix Version/s: 2.0.0
   1.8.5
 Assignee: Tilman Hausherr
  Summary: ColorSpace with empty Range array  (was: ColorSpace 
without Range)

> ColorSpace with empty Range array
> -
>
> Key: PDFBOX-2042
> URL: https://issues.apache.org/jira/browse/PDFBOX-2042
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4, 1.8.5, 2.0.0
>Reporter: Juraj Lonc
>Assignee: Tilman Hausherr
> Fix For: 1.8.5, 2.0.0
>
> Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf
>
>
> I have PDF document where I am modifying PDPage content stream.
> Saved document is invalid (Adobe reader complains about it).
> I have narrowed it down to ColorSpace. 
> Original document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> >>]>>
> Modified document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> /Range []
> >>]>>
> When I manually remove "/Range []" from PDF then Adobe reader opens it 
> without an error.
> Obviously that range is added by calling PDICCBased.getRangeArray(0) 
> somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: community bonding period

2014-04-24 Thread Andreas Lehmkühler
Hi,

first of all, thanks for the summary.

I've added some comments inline

> Tilman Hausherr  hat am 23. April 2014 um 16:49
> geschrieben:
>
>
> Although I'm only mentoring Shaola, maybe some of it is useful for
> Dimuthu as well:
>
>  From the mentors list:
> ===
> We now are in the community bonding period [1] which lasts until May 19.
> During this period students should learn about your project, your
> release processes, the Apache Way, how we do things around here,
> interact with the community and close any knowledge gaps they might
> have. [1]
> http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html
> ===
> Here's a FAQ about Apache:
> https://www.apache.org/foundation/faq.html
> IMHO most important are "What is Apache about?" and "What is Apache not
> about?". (My personal addendum to that is "Apache is not like
> Wikipedia". If you've ever edited in wikipedia, you'll notice the
> difference after a few days)
>
> https://www.apache.org/foundation/how-it-works.html
> The roles are simpler than in that text, all committers here are PMC
> members, and the PMC chair (Andreas) is also ASF member.
That's correct. The fact that I'm a member doesn't have any influence on the
project and
just for the record Jeremias and Jukka (going emeritus a coule of days ago) are
also members.

> Only committers and above have write access to the official PDFBOX
> repository. So the best would be to set up a copy on an open source
> repository.
> https://en.wikipedia.org/wiki/Comparison_of_open-source_software_hosting_facilities
>
> We're trying to be transparent. So stuff that deals with the
> implementation of the project should probably be in the ticket. To see
> what I mean, have a look at
> https://issues.apache.org/jira/browse/PDFBOX-615 and the related issues.
> PDFBOX-615 started with "I will be trying to add this functionality this
> week" but it became a huge effort by several people that ended 4 years
> later :-) See also John's remarks about my code. It annoyed me somewhat
> at the beginning, but at the end it resulted in much better code.
+1, it's important that technical discussions are about technical matters and
not
about personal matters. So try not to get personal and even maybe even harder
try not to take any comment personal. :-)

> Note that you can edit in JIRA. See an example here
> https://issues.apache.org/jira/browse/PDFBOX-2039
> i.e. you can modify previous posts.
But, please especially if you're editing older comments, preserve the context.
Don't remove parts others have commented on.

> Stuff that deals with PDFBOX in general is best in this (publicly
> readable) mailing list. The advantage is that others might answer you
> (if they want) when I'm working, sleeping, or not on the internet for
> whatever reason. Stuff that deals with java, svn and maven - e-mail me
> if you don't get the answer within a few minutes from google or from
> stackoverflow, i.e. don't waste time searching.
>
> Using other libraries: this is OK as long as they have an Apache license
> or a compatible license (GPL is not). However we don't use many
> libraries, everything is already big, so if you want, ask first. (Sorry
> if you already mentioned a library, will reread your proposal again
> later) Of course it is always OK to temporary use whatever you want to
> just test a theory / strategy / algorithm.
> Using other code: the code should rather be your own, but you can use
> small excerpts from stackoverflow.com etc but indicate it in your code
> with a link. Always comment in the code if you were "inspired" by other
> peoples code or algorithms or research papers, just look at the existing
> shading code for how I did it.
>
> Don't forget the Apache header in new modules.
Otherwise the CI-build will fail.

> Your code should work on JDK5, so that we can use it in the 1.8 version
> too. So don't use diamond operators, lambda expressions or even
> String.isEmpty().
>
> IDE: I recommend netbeans but you're free to use your own. Just make
> sure that svn (and whatever the hoster will use) and maven are
> integrated in it, this will make your life easier.
>
> A personal recommendation from my student days in the 80ies: don't work
> all night. Such code was usually found to be poor/worthless after I had
> the much needed sleep.
>
> Andreas: correct me if I forgot something.
Maybe some minor but helpful things...

There are some code formatting "rules". We provide a checkstyle config and
an ecplise-only code formatter ruleset. I've to check the latter if it's
still up to date. Both could/should be used to check the code.

All changes should be provided as patch against the trunk in a common diff
format
so that it can be easily integrated by any of the committers.
 
> Tilman

Maybe we should add those information to our website as well. At the moment
the ASF CMS doesn't work, but I'm pretty sure that infra is already working on
that issue.

BR
Andreas Lehmkühler


Re: New PDFBox bugfix release 1.8.5

2014-04-24 Thread Andreas Lehmkühler
Hi,

I'm planning to cut the release at the beginning of the next week.

Any objections?

@Maruan
What about your pending javadoc changes? Do you need more time or help? As we
are not in a hurry, it wouldn't be a problem to postpone the release process for
another week or two.

BR
Andreas Lehmkühler

> Andreas Lehmkuehler  hat am 18. April 2014 um 15:36
> geschrieben:
>
>
> Hi,
>
> it's time to cut a new bugfix release as there are a lot of fixes
> in our queue. Additionally I already announced a possible new release in the
> second quarter and people are already asking for it. ;-)
>
> WDYT?
> Is there anything we should wait for? Any fix only available in the trunk
> which
> should be merged into then branch as well? What about the 4 open issues [1]
> marked with fix for 1.8.5?
>
> BR
> Andreas Lehmkühler
>
> [1] http://s.apache.org/VwQ


[jira] [Created] (PDFBOX-2043) While Reading a PDF which contains Image the Content of the PDF is misaligned in the resulting text.

2014-04-24 Thread Venkatesan (JIRA)
Venkatesan created PDFBOX-2043:
--

 Summary: While Reading a PDF which contains Image the Content of 
the PDF is misaligned in the resulting text.
 Key: PDFBOX-2043
 URL: https://issues.apache.org/jira/browse/PDFBOX-2043
 Project: PDFBox
  Issue Type: Bug
 Environment: Visual Studio 2005
Reporter: Venkatesan


We are trying to read content of a PDF file, The PDF has images in the header. 
We use the PDFTextStripper.getText() method. After calling this method the 
resulting text is misaligned compare to the Original PDF.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2041) Convert PDF to Image (Strange Color)

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979239#comment-13979239
 ] 

Tilman Hausherr edited comment on PDFBOX-2041 at 4/24/14 8:35 AM:
--

I didn't mean to remove %%EOF, just everything after it.

Could it be your Ubuntu disk is full?

If not - I don't have Ubuntu, so someone else will have to answer that.

Also try using the PDFBox app:

java -jar pdfbox-app-1.8.4-SNAPSHOT.jar PDFReader yourfile.pdf

does it display correctly or not?


was (Author: tilman):
I didn't mean to remove %%EOF, just everything after it.

Could it be your Ubuntu disk is full?

I not - I don't have Ubuntu, so someone else will have to answer that.

Also try using the PDFBox app:

java -jar pdfbox-app-1.8.4-SNAPSHOT.jar PDFReader yourfile.pdf

does it display correctly or not?

> Convert PDF to Image (Strange Color)
> 
>
> Key: PDFBOX-2041
> URL: https://issues.apache.org/jira/browse/PDFBOX-2041
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4
> Environment: Java(1.7.0_45),   OS (Ubuntu) 
>Reporter: ahfei
> Attachments: PDFBOX-2041.pdf, PDFBOX-2041.pdf-1-bad.tif, 
> pdfbox-2041.pdf-1-good.png
>
>
> Using PDFBox, tried to convert PDF to Image file  (case1.pdf, case1.jpg)
> Below is code i'm using : 
> BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200);   
>  
> ImageIOUtil.writeImage(image, "jpg", imagePath, BufferedImage.TYPE_INT_RGB, 
> 200);
> After convert, this image isn't look like pdf. Half page of it become blue 
> and black color. 
> Attached images & PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2042) ColorSpace without Range

2014-04-24 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2042:
---

Attachment: ModifyTest.java

Here is the sample code.
Actually I do not need to modify content of page. Problem is caused just by 
calling pdResources.getColorSpaces(); and then saving document.

> ColorSpace without Range
> 
>
> Key: PDFBOX-2042
> URL: https://issues.apache.org/jira/browse/PDFBOX-2042
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
>Reporter: Juraj Lonc
> Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf
>
>
> I have PDF document where I am modifying PDPage content stream.
> Saved document is invalid (Adobe reader complains about it).
> I have narrowed it down to ColorSpace. 
> Original document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> >>]>>
> Modified document has colorspace:
> /ColorSpace <<
> /Cs6 [/ICCBased <<
> /Alternate /DeviceRGB
> /Filter /FlateDecode
> /Length 2597
> /N 3
> /Range []
> >>]>>
> When I manually remove "/Range []" from PDF then Adobe reader opens it 
> without an error.
> Obviously that range is added by calling PDICCBased.getRangeArray(0) 
> somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)