[jira] [Comment Edited] (PDFBOX-2041) Convert PDF to Image (Strange Color)

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979239#comment-13979239
 ] 

Tilman Hausherr edited comment on PDFBOX-2041 at 4/24/14 6:29 AM:
--

I didn't mean to remove %%EOF, just everything after it.

Could it be your Ubuntu disk is full?

I not - I don't have Ubuntu, so someone else will have to answer that.

Also try using the PDFBox app:

java -jar pdfbox-app-1.8.4-SNAPSHOT.jar PDFReader yourfile.pdf

does it display correctly or not?


was (Author: tilman):
I didn't mean to remove %%EOF, just everything after it.

Could it be your Ubuntu disk is full?

I don't have Ubuntu, so someone else will have to answer that.

 Convert PDF to Image (Strange Color)
 

 Key: PDFBOX-2041
 URL: https://issues.apache.org/jira/browse/PDFBOX-2041
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4
 Environment: Java(1.7.0_45),   OS (Ubuntu) 
Reporter: ahfei
 Attachments: PDFBOX-2041.pdf, PDFBOX-2041.pdf-1-bad.tif, 
 pdfbox-2041.pdf-1-good.png


 Using PDFBox, tried to convert PDF to Image file  (case1.pdf, case1.jpg)
 Below is code i'm using : 
 BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200);   
  
 ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 
 200);
 After convert, this image isn't look like pdf. Half page of it become blue 
 and black color. 
 Attached images  PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2042) ColorSpace without Range

2014-04-24 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2042:
---

Attachment: ModifyTest.java

Here is the sample code.
Actually I do not need to modify content of page. Problem is caused just by 
calling pdResources.getColorSpaces(); and then saving document.

 ColorSpace without Range
 

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2041) Convert PDF to Image (Strange Color)

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979239#comment-13979239
 ] 

Tilman Hausherr edited comment on PDFBOX-2041 at 4/24/14 8:35 AM:
--

I didn't mean to remove %%EOF, just everything after it.

Could it be your Ubuntu disk is full?

If not - I don't have Ubuntu, so someone else will have to answer that.

Also try using the PDFBox app:

java -jar pdfbox-app-1.8.4-SNAPSHOT.jar PDFReader yourfile.pdf

does it display correctly or not?


was (Author: tilman):
I didn't mean to remove %%EOF, just everything after it.

Could it be your Ubuntu disk is full?

I not - I don't have Ubuntu, so someone else will have to answer that.

Also try using the PDFBox app:

java -jar pdfbox-app-1.8.4-SNAPSHOT.jar PDFReader yourfile.pdf

does it display correctly or not?

 Convert PDF to Image (Strange Color)
 

 Key: PDFBOX-2041
 URL: https://issues.apache.org/jira/browse/PDFBOX-2041
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4
 Environment: Java(1.7.0_45),   OS (Ubuntu) 
Reporter: ahfei
 Attachments: PDFBOX-2041.pdf, PDFBOX-2041.pdf-1-bad.tif, 
 pdfbox-2041.pdf-1-good.png


 Using PDFBox, tried to convert PDF to Image file  (case1.pdf, case1.jpg)
 Below is code i'm using : 
 BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200);   
  
 ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 
 200);
 After convert, this image isn't look like pdf. Half page of it become blue 
 and black color. 
 Attached images  PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2043) While Reading a PDF which contains Image the Content of the PDF is misaligned in the resulting text.

2014-04-24 Thread Venkatesan (JIRA)
Venkatesan created PDFBOX-2043:
--

 Summary: While Reading a PDF which contains Image the Content of 
the PDF is misaligned in the resulting text.
 Key: PDFBOX-2043
 URL: https://issues.apache.org/jira/browse/PDFBOX-2043
 Project: PDFBox
  Issue Type: Bug
 Environment: Visual Studio 2005
Reporter: Venkatesan


We are trying to read content of a PDF file, The PDF has images in the header. 
We use the PDFTextStripper.getText() method. After calling this method the 
resulting text is misaligned compare to the Original PDF.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: community bonding period

2014-04-24 Thread Andreas Lehmkühler
Hi,

first of all, thanks for the summary.

I've added some comments inline

 Tilman Hausherr thaush...@t-online.de hat am 23. April 2014 um 16:49
 geschrieben:


 Although I'm only mentoring Shaola, maybe some of it is useful for
 Dimuthu as well:

  From the mentors list:
 ===
 We now are in the community bonding period [1] which lasts until May 19.
 During this period students should learn about your project, your
 release processes, the Apache Way, how we do things around here,
 interact with the community and close any knowledge gaps they might
 have. [1]
 http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html
 ===
 Here's a FAQ about Apache:
 https://www.apache.org/foundation/faq.html
 IMHO most important are What is Apache about? and What is Apache not
 about?. (My personal addendum to that is Apache is not like
 Wikipedia. If you've ever edited in wikipedia, you'll notice the
 difference after a few days)

 https://www.apache.org/foundation/how-it-works.html
 The roles are simpler than in that text, all committers here are PMC
 members, and the PMC chair (Andreas) is also ASF member.
That's correct. The fact that I'm a member doesn't have any influence on the
project and
just for the record Jeremias and Jukka (going emeritus a coule of days ago) are
also members.

 Only committers and above have write access to the official PDFBOX
 repository. So the best would be to set up a copy on an open source
 repository.
 https://en.wikipedia.org/wiki/Comparison_of_open-source_software_hosting_facilities

 We're trying to be transparent. So stuff that deals with the
 implementation of the project should probably be in the ticket. To see
 what I mean, have a look at
 https://issues.apache.org/jira/browse/PDFBOX-615 and the related issues.
 PDFBOX-615 started with I will be trying to add this functionality this
 week but it became a huge effort by several people that ended 4 years
 later :-) See also John's remarks about my code. It annoyed me somewhat
 at the beginning, but at the end it resulted in much better code.
+1, it's important that technical discussions are about technical matters and
not
about personal matters. So try not to get personal and even maybe even harder
try not to take any comment personal. :-)

 Note that you can edit in JIRA. See an example here
 https://issues.apache.org/jira/browse/PDFBOX-2039
 i.e. you can modify previous posts.
But, please especially if you're editing older comments, preserve the context.
Don't remove parts others have commented on.

 Stuff that deals with PDFBOX in general is best in this (publicly
 readable) mailing list. The advantage is that others might answer you
 (if they want) when I'm working, sleeping, or not on the internet for
 whatever reason. Stuff that deals with java, svn and maven - e-mail me
 if you don't get the answer within a few minutes from google or from
 stackoverflow, i.e. don't waste time searching.

 Using other libraries: this is OK as long as they have an Apache license
 or a compatible license (GPL is not). However we don't use many
 libraries, everything is already big, so if you want, ask first. (Sorry
 if you already mentioned a library, will reread your proposal again
 later) Of course it is always OK to temporary use whatever you want to
 just test a theory / strategy / algorithm.
 Using other code: the code should rather be your own, but you can use
 small excerpts from stackoverflow.com etc but indicate it in your code
 with a link. Always comment in the code if you were inspired by other
 peoples code or algorithms or research papers, just look at the existing
 shading code for how I did it.

 Don't forget the Apache header in new modules.
Otherwise the CI-build will fail.

 Your code should work on JDK5, so that we can use it in the 1.8 version
 too. So don't use diamond operators, lambda expressions or even
 String.isEmpty().

 IDE: I recommend netbeans but you're free to use your own. Just make
 sure that svn (and whatever the hoster will use) and maven are
 integrated in it, this will make your life easier.

 A personal recommendation from my student days in the 80ies: don't work
 all night. Such code was usually found to be poor/worthless after I had
 the much needed sleep.

 Andreas: correct me if I forgot something.
Maybe some minor but helpful things...

There are some code formatting rules. We provide a checkstyle config and
an ecplise-only code formatter ruleset. I've to check the latter if it's
still up to date. Both could/should be used to check the code.

All changes should be provided as patch against the trunk in a common diff
format
so that it can be easily integrated by any of the committers.
 
 Tilman

Maybe we should add those information to our website as well. At the moment
the ASF CMS doesn't work, but I'm pretty sure that infra is already working on
that issue.

BR
Andreas Lehmkühler


[jira] [Updated] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2042:


Affects Version/s: 1.8.5
   1.8.4
Fix Version/s: 2.0.0
   1.8.5
 Assignee: Tilman Hausherr
  Summary: ColorSpace with empty Range array  (was: ColorSpace 
without Range)

 ColorSpace with empty Range array
 -

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Juraj Lonc
Assignee: Tilman Hausherr
 Fix For: 1.8.5, 2.0.0

 Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979838#comment-13979838
 ] 

Tilman Hausherr commented on PDFBOX-2042:
-

Thanks for the test.

Ouch!!! private COSArray getRangeArray(int n) does something sometimes _and_ 
returns something, and has two bugs. What I think was intended was to extend 
the range array with default values when needed. However the default values are 
the wrong ones (should be 0 1 according to the spec, not -100 100, this is for 
LAB), and the array extension isn't done because of an off-by-one mistake. I 
have committed a fix in rev 1589767 for the trunk and rev 1589769 for the 1.8 
branch.

Before:
{code}
private COSArray getRangeArray(int n)
{
COSArray rangeArray = 
(COSArray)stream.getStream().getDictionaryObject(COSName.RANGE);
if(rangeArray == null)
{
rangeArray = new COSArray();
stream.getStream().setItem(COSName.RANGE, rangeArray);
while(rangeArray.size()  n*2)
{
rangeArray.add(new COSFloat(-100));
rangeArray.add(new COSFloat(100));
}
}
return rangeArray;
}
{code}

After:

{code}
/**
 * Get the range array, create and fill it with default values (0, 1) if
 * needed so that it has enough value pairs for the position.
 *
 * @param pos The zero-based position that should exist after this call is
 * completed.
 * @return A valid range array.
 */
private COSArray getRangeArray(int pos)
{
//TODO per clean code, a method should either 
// return something or modify something, but not both.
COSArray rangeArray = 
(COSArray)stream.getStream().getDictionaryObject(COSName.RANGE);
if(rangeArray == null)
{
rangeArray = new COSArray();
stream.getStream().setItem(COSName.RANGE, rangeArray);
}
// extend range array with default values if needed
while (rangeArray.size()  (pos + 1) * 2)
{
rangeArray.add(new COSFloat(0));
rangeArray.add(new COSFloat(1));
}
return rangeArray;
}
{code}

I will try to create a better fix later this week that returns default values 
if the array doesn't exist or is too small, and creates a correctly sized array 
for writing operations. This will have the advantage that PDF files don't get 
longer, i.e. don't have unneeded default range arrays. (This fix creates a 
default range array)

Btw this bug also resulted in an exception in TestExtractText.

The fixed libs will appear within a few hours here:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.5-SNAPSHOT/

 ColorSpace with empty Range array
 -

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Juraj Lonc
Assignee: Tilman Hausherr
 Fix For: 1.8.5, 2.0.0

 Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2043) While Reading a PDF which contains Image the Content of the PDF is misaligned in the resulting text.

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979846#comment-13979846
 ] 

Tilman Hausherr commented on PDFBOX-2043:
-

Please attach a PDF file, a tell what you would expect, and what you got, and 
which PDFBox version you used. And what exactly is the role of Visual Studio 
2005 there? Are you using a .net version of PDFBox?

 While Reading a PDF which contains Image the Content of the PDF is misaligned 
 in the resulting text.
 

 Key: PDFBOX-2043
 URL: https://issues.apache.org/jira/browse/PDFBOX-2043
 Project: PDFBox
  Issue Type: Bug
 Environment: Visual Studio 2005
Reporter: Venkatesan

 We are trying to read content of a PDF file, The PDF has images in the 
 header. We use the PDFTextStripper.getText() method. After calling this 
 method the resulting text is misaligned compare to the Original PDF.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2043) While Reading a PDF which contains Image the Content of the PDF is misaligned in the resulting text.

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979846#comment-13979846
 ] 

Tilman Hausherr edited comment on PDFBOX-2043 at 4/24/14 3:36 PM:
--

Please attach a PDF file, tell what you expected, and what you got instead, and 
which PDFBox version you used. And what exactly is the role of Visual Studio 
2005 there? Are you using a .net version of PDFBox?


was (Author: tilman):
Please attach a PDF file, a tell what you would expect, and what you got, and 
which PDFBox version you used. And what exactly is the role of Visual Studio 
2005 there? Are you using a .net version of PDFBox?

 While Reading a PDF which contains Image the Content of the PDF is misaligned 
 in the resulting text.
 

 Key: PDFBOX-2043
 URL: https://issues.apache.org/jira/browse/PDFBOX-2043
 Project: PDFBox
  Issue Type: Bug
 Environment: Visual Studio 2005
Reporter: Venkatesan

 We are trying to read content of a PDF file, The PDF has images in the 
 header. We use the PDFTextStripper.getText() method. After calling this 
 method the resulting text is misaligned compare to the Original PDF.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979853#comment-13979853
 ] 

Andreas Lehmkühler commented on PDFBOX-2042:


[~tilman] There is another more important bug to be fixed. A read operation 
must not alter the pdf, saying that, please remove the setItem() call.

 ColorSpace with empty Range array
 -

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Juraj Lonc
Assignee: Tilman Hausherr
 Fix For: 1.8.5, 2.0.0

 Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980070#comment-13980070
 ] 

Tilman Hausherr commented on PDFBOX-2042:
-

Done in rev 1589827 for the trunk and rev 1589828 for the 1.8 branch. I dropped 
the idea of creating a correct size array (getNumberOfComponents() * 2) for a 
write, because the (deprecated) call setNumberOfComponents() allows dynamic 
change of the component count.

 ColorSpace with empty Range array
 -

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Juraj Lonc
Assignee: Tilman Hausherr
 Fix For: 1.8.5, 2.0.0

 Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980070#comment-13980070
 ] 

Tilman Hausherr edited comment on PDFBOX-2042 at 4/24/14 6:37 PM:
--

Done in rev 1589827 for the trunk and rev 1589828 for the 1.8 branch. I dropped 
the idea of creating a correct size array (getNumberOfComponents() * 2) for a 
write, because the (deprecated) call setNumberOfComponents() allows dynamic 
change of the component count.

And [~chupacabras] gets a saved PDF that does not have an unneeded default 
range as in the previous fix.


was (Author: tilman):
Done in rev 1589827 for the trunk and rev 1589828 for the 1.8 branch. I dropped 
the idea of creating a correct size array (getNumberOfComponents() * 2) for a 
write, because the (deprecated) call setNumberOfComponents() allows dynamic 
change of the component count.

 ColorSpace with empty Range array
 -

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Juraj Lonc
Assignee: Tilman Hausherr
 Fix For: 1.8.5, 2.0.0

 Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2041) Convert PDF to Image (Strange Color)

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978953#comment-13978953
 ] 

Tilman Hausherr edited comment on PDFBOX-2041 at 4/24/14 6:48 PM:
--

1. The PDF file is corrupt. A look at it with NOTEPAD++ shows %%EOF and then 
trash characters. Deleting all after that one makes the file much smaller, 
518KB instead of 4,85MB. How did you get that file?!
2. I am able to render it. Your jpg file looks like it was cut off at some time.
3. The 2.0 version isn't able to open it with the non sequential parser, the 
sequential parser can open it.
4. The 1.8 version renders it fine, the 2.0 version has many glyphs missing, 
maybe a duplicate of PDFBOX-2037. I was able to render it with a modified 2.0 
version that I use for myself. I will handle that problem in PDFBOX-2044.


was (Author: tilman):
1. The PDF file is corrupt. A look at it with NOTEPAD++ shows %%EOF and then 
trash characters. Deleting all after that one makes the file much smaller, 
518KB instead of 4,85MB. How did you get that file?!
2. I am able to render it. Your jpg file looks like it was cut off at some time.
3. The 2.0 version isn't able to open it with the non sequential parser, the 
sequential parser can open it.
4. The 1.8 version renders it fine, the 2.0 version has many glyphs missing, 
maybe a duplicate of PDFBOX-2037. I was able to render it with a modified 2.0 
version that I use for myself.

 Convert PDF to Image (Strange Color)
 

 Key: PDFBOX-2041
 URL: https://issues.apache.org/jira/browse/PDFBOX-2041
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4
 Environment: Java(1.7.0_45),   OS (Ubuntu) 
Reporter: ahfei
 Attachments: PDFBOX-2041.pdf, PDFBOX-2041.pdf-1-bad.tif, 
 pdfbox-2041.pdf-1-good.png


 Using PDFBox, tried to convert PDF to Image file  (case1.pdf, case1.jpg)
 Below is code i'm using : 
 BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200);   
  
 ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 
 200);
 After convert, this image isn't look like pdf. Half page of it become blue 
 and black color. 
 Attached images  PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2044) TrueType glyphs not displayed in rendering

2014-04-24 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created PDFBOX-2044:
---

 Summary: TrueType glyphs not displayed in rendering
 Key: PDFBOX-2044
 URL: https://issues.apache.org/jira/browse/PDFBOX-2044
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr


In the file of PDFBOX-2041 the text isn't displayed in the 2.0 version.

This is related to the truetype 'loca' table:
https://developer.apple.com/fonts/TTRefMan/RM06/Chap6loca.html

In the table of that file, the endOfGlyphs variable, which is the last offset 
value (extra in the spec), is 0. Therefore, GlyphTable.initData() doesn't 
read any glyphs because it believes that the end has already been reached, 
because the first offset is (of course) 0 and is identical to the endOfGlyphs 
variable.

I will fix this by disregarding endOfGlyphs == offset if endOfGlyphs is 0, and 
not asking for offset equality to skip glyphs, instead I require that the next 
offset is bigger.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2044) TrueType glyphs not displayed in rendering

2014-04-24 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2044:


Description: 
In the file of PDFBOX-2041 the text isn't displayed in the 2.0 version. It 
works in the 1.8 version which uses awt.

The cause is related to the truetype 'loca' table:
https://developer.apple.com/fonts/TTRefMan/RM06/Chap6loca.html

In the table of that file, the endOfGlyphs variable, which is the last offset 
value (extra in the spec), is 0. Therefore, GlyphTable.initData() doesn't 
read any glyphs because it believes that the end has already been reached, 
because the first offset is (of course) 0 and is identical to the endOfGlyphs 
variable.

I will fix this by disregarding endOfGlyphs == offset if endOfGlyphs is 0, and 
not asking for offset equality to skip glyphs, instead I require that the next 
offset is bigger.

  was:
In the file of PDFBOX-2041 the text isn't displayed in the 2.0 version.

This is related to the truetype 'loca' table:
https://developer.apple.com/fonts/TTRefMan/RM06/Chap6loca.html

In the table of that file, the endOfGlyphs variable, which is the last offset 
value (extra in the spec), is 0. Therefore, GlyphTable.initData() doesn't 
read any glyphs because it believes that the end has already been reached, 
because the first offset is (of course) 0 and is identical to the endOfGlyphs 
variable.

I will fix this by disregarding endOfGlyphs == offset if endOfGlyphs is 0, and 
not asking for offset equality to skip glyphs, instead I require that the next 
offset is bigger.


 TrueType glyphs not displayed in rendering
 --

 Key: PDFBOX-2044
 URL: https://issues.apache.org/jira/browse/PDFBOX-2044
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr

 In the file of PDFBOX-2041 the text isn't displayed in the 2.0 version. It 
 works in the 1.8 version which uses awt.
 The cause is related to the truetype 'loca' table:
 https://developer.apple.com/fonts/TTRefMan/RM06/Chap6loca.html
 In the table of that file, the endOfGlyphs variable, which is the last offset 
 value (extra in the spec), is 0. Therefore, GlyphTable.initData() doesn't 
 read any glyphs because it believes that the end has already been reached, 
 because the first offset is (of course) 0 and is identical to the endOfGlyphs 
 variable.
 I will fix this by disregarding endOfGlyphs == offset if endOfGlyphs is 0, 
 and not asking for offset equality to skip glyphs, instead I require that the 
 next offset is bigger.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-1069) Ubuntu throws exceptions when fonts missing

2014-04-24 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-1069.
-

   Resolution: Later
Fix Version/s: 2.0.0
   1.8.5

 Ubuntu throws exceptions when fonts missing
 ---

 Key: PDFBOX-1069
 URL: https://issues.apache.org/jira/browse/PDFBOX-1069
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.6.0
 Environment: Ubuntu 10.10
Reporter: Sarah Kelley
 Fix For: 1.8.5, 2.0.0

 Attachments: sakelley_pdf_rendering_problem.zip


 On a plain vanilla Ubuntu 10.10 install, running
 run-all failed to render any text, and threw lots of exceptions:
 
 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getawtFont(PDTrueTypeFont.java:425)
 
 ...however, installing the package ttf-mscorefonts-installer
 made those exceptions go away.
 (ubuntu1010_output.txt shows the exceptions; ubuntu1010_try2_output.txt is a 
 run after the extra fonts are installed)
 
 Might be able to fix this one by setting UNKNOWN_FONT in
 Resources/PDFBox_External_Fonts.properties, but it would seem like
 it should choose some reasonable default if it isn't set...
 shouldn't it?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980141#comment-13980141
 ] 

Andreas Lehmkühler commented on PDFBOX-2042:


[~tilman] Thanks for the prompt action

 ColorSpace with empty Range array
 -

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Juraj Lonc
Assignee: Tilman Hausherr
 Fix For: 1.8.5, 2.0.0

 Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980165#comment-13980165
 ] 

Juraj Lonc commented on PDFBOX-2042:


Thanks for fix ;)

 ColorSpace with empty Range array
 -

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Juraj Lonc
Assignee: Tilman Hausherr
 Fix For: 1.8.5, 2.0.0

 Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2044) TrueType glyphs not displayed in rendering

2014-04-24 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2044.
-

   Resolution: Fixed
Fix Version/s: 2.0.0

Fixed in rev 1589893 for the trunk.

 TrueType glyphs not displayed in rendering
 --

 Key: PDFBOX-2044
 URL: https://issues.apache.org/jira/browse/PDFBOX-2044
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 2.0.0


 In the file of PDFBOX-2041 the text isn't displayed in the 2.0 version. It 
 works in the 1.8 version which uses awt.
 The cause is related to the truetype 'loca' table:
 https://developer.apple.com/fonts/TTRefMan/RM06/Chap6loca.html
 In the table of that file, the endOfGlyphs variable, which is the last offset 
 value (extra in the spec), is 0. Therefore, GlyphTable.initData() doesn't 
 read any glyphs because it believes that the end has already been reached, 
 because the first offset is (of course) 0 and is identical to the endOfGlyphs 
 variable.
 I will fix this by disregarding endOfGlyphs == offset if endOfGlyphs is 0, 
 and not asking for offset equality to skip glyphs, instead I require that the 
 next offset is bigger.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2042.
-

Resolution: Fixed

Thanks for the feedback. Btw, PDLab has the same problem, I'll create an issue 
soon.

 ColorSpace with empty Range array
 -

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Juraj Lonc
Assignee: Tilman Hausherr
 Fix For: 1.8.5, 2.0.0

 Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2041) Convert PDF to Image (Strange Color)

2014-04-24 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979239#comment-13979239
 ] 

Tilman Hausherr edited comment on PDFBOX-2041 at 4/24/14 10:57 PM:
---

I didn't mean to remove %%EOF, just everything after it.

Could it be your Ubuntu disk is full?

If not - I don't have Ubuntu, so someone else will have to answer that.

Also try using the PDFBox app:

java -jar pdfbox-app-1.8.4-SNAPSHOT.jar PDFReader yourfile.pdf

does it display correctly or not? 

And what java version are you using? If it isn't the latest, what happens if 
you update to Java SE 7 Update 55  ?


was (Author: tilman):
I didn't mean to remove %%EOF, just everything after it.

Could it be your Ubuntu disk is full?

If not - I don't have Ubuntu, so someone else will have to answer that.

Also try using the PDFBox app:

java -jar pdfbox-app-1.8.4-SNAPSHOT.jar PDFReader yourfile.pdf

does it display correctly or not?

 Convert PDF to Image (Strange Color)
 

 Key: PDFBOX-2041
 URL: https://issues.apache.org/jira/browse/PDFBOX-2041
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4
 Environment: Java(1.7.0_45),   OS (Ubuntu) 
Reporter: ahfei
 Attachments: PDFBOX-2041.pdf, PDFBOX-2041.pdf-1-bad.tif, 
 pdfbox-2041.pdf-1-good.png


 Using PDFBox, tried to convert PDF to Image file  (case1.pdf, case1.jpg)
 Below is code i'm using : 
 BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 200);   
  
 ImageIOUtil.writeImage(image, jpg, imagePath, BufferedImage.TYPE_INT_RGB, 
 200);
 After convert, this image isn't look like pdf. Half page of it become blue 
 and black color. 
 Attached images  PDF : https://www.dropbox.com/sh/jevegc8bh09km1o/5XkVwPUxri 



--
This message was sent by Atlassian JIRA
(v6.2#6252)