PDF Text extraction problem

2010-01-28 Thread Amit Lole
Hi, I am trying to extract text from the pdf file using pdfbox 0.7.3, but the output file is not complete. Some pages are missing in the output. can you please help me in resolving this issue. I have attached sample pdf with this mail. Thanks Amit

[jira] Updated: (PDFBOX-608) Bug in CMap implementation

2010-01-28 Thread Nicolas PENINGUY (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas PENINGUY updated PDFBOX-608: Attachment: cmap.patch A test case is provided... > Bug in CMap implementation > -

[jira] Created: (PDFBOX-608) Bug in CMap implementation

2010-01-28 Thread Nicolas PENINGUY (JIRA)
Bug in CMap implementation -- Key: PDFBOX-608 URL: https://issues.apache.org/jira/browse/PDFBOX-608 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 0.8.0-incubator Repor

[jira] Resolved: (PDFBOX-50) Hierarchical PDRadioCollections cannot be processed.

2010-01-28 Thread JIRA
[ https://issues.apache.org/jira/browse/PDFBOX-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler resolved PDFBOX-50. -- Resolution: Fixed Fix Version/s: 1.0.0 With version 904262 I've added the provide

[jira] Resolved: (PDFBOX-596) PDActionURI: invalid getBase()

2010-01-28 Thread JIRA
[ https://issues.apache.org/jira/browse/PDFBOX-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler resolved PDFBOX-596. --- Resolution: Fixed Fix Version/s: 1.0.0 With version 904255 I've added the prov

[jira] Resolved: (PDFBOX-594) Typo: change getBoderStyle() to getBorderStyle()

2010-01-28 Thread JIRA
[ https://issues.apache.org/jira/browse/PDFBOX-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler resolved PDFBOX-594. --- Resolution: Fixed Fix Version/s: 1.0.0 I've fixed the type with version 904236

[jira] Resolved: (PDFBOX-505) Support for adding a textmatrix, textscaling and textrotation

2010-01-28 Thread JIRA
[ https://issues.apache.org/jira/browse/PDFBOX-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler resolved PDFBOX-505. --- Resolution: Fixed Fix Version/s: 1.0.0 With version 904229 I've added a sample

RE: [idea] PdfReader In Google Summer of Code 2010

2010-01-28 Thread Martinez, Mel - 1004 - MITLL
If you are using a Mac, the NeoOffice version of OpenOffice can open & modify PDF files. I haven't done a lot of that (opening & modifying PDFs) with it, so I don't know how useful/robust that capability is. I have used NeoOffice a lot for its regular 'office productivity' features and like i

Re: [idea] PdfReader In Google Summer of Code 2010

2010-01-28 Thread Dexter Mishra
PDFReader name gives the wrong message to people. People are more attracted towards other names as composer (can read, compose documents). I can devote some time for the project, I have been using adobe libraries, and some time back I created a hack of PDFBox library for our company purpose. It use

Re: [idea] PdfReader In Google Summer of Code 2010

2010-01-28 Thread Adam
I like the idea. The question is... are there are enough people willing to dedicate the time to code this? Also, I disagree about PDFReader being the wrong name. The class still reads a PDF from the file system and processes it so it can be rendered, manipulated, etc. From: Jeremias Maerki

[jira] Commented: (PDFBOX-542) Support for Adobe CFF/Type2 fonts

2010-01-28 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805975#action_12805975 ] Jukka Zitting commented on PDFBOX-542: -- The new files you add in the patch come with h

Re: Re: Schedule

2010-01-28 Thread Andreas Lehmkühler
Hi, Gesendet: Do, 28. Jan 2010 Von: Jukka Zitting > Hi, > > On Thu, Jan 28, 2010 at 3:28 PM, Martinez, Mel - 1004 - MITLL > wrote: > > Is there any chance we could get v1.0 released in the next two weeks? > > I don't see any big blockers for that, and having a post-graduation > release out wou

Re: Schedule

2010-01-28 Thread Johannes Koch
Jukka Zitting schrieb: Does anyone have anything they'd still want done before the release? I did some bug fix patches (PDFBOX-50, PDFBOX-593, PDFBOX-594, PDFBOX-596, PDFBOX-597)... -- Johannes Koch Fraunhofer Institute for Applied Information Technology FIT Web Compliance Center Schloss Bi

Re: Schedule

2010-01-28 Thread Johannes Koch
Igor Podolskiy schrieb: Hi, On 28.01.2010 16:47, Jukka Zitting wrote: I don't see any big blockers for that, and having a post-graduation release out would be nice. Does anyone have anything they'd still want done before the release? page labels (PDFBOX-90) would be really nice, and there is a

Re: Schedule

2010-01-28 Thread Igor Podolskiy
Hi, On 28.01.2010 16:47, Jukka Zitting wrote: I don't see any big blockers for that, and having a post-graduation release out would be nice. Does anyone have anything they'd still want done before the release? page labels (PDFBOX-90) would be really nice, and there is a patch ;) (even two of th

Re: Schedule

2010-01-28 Thread Jukka Zitting
Hi, On Thu, Jan 28, 2010 at 3:28 PM, Martinez, Mel - 1004 - MITLL wrote: > Is there any chance we could get v1.0 released in the next two weeks? I don't see any big blockers for that, and having a post-graduation release out would be nice. Does anyone have anything they'd still want done before

Re: [idea] PdfReader In Google Summer of Code 2010

2010-01-28 Thread Ruben Reusser
you might want to have a look at inforama ( http://www.inforama.org ) for PDF composition. It uses open office to edit and annotate PDF documents and then produces the documents as well. Ruben Dexter Mishra wrote: Well I guess there are already a lot of tools for PDF viewing, the one I would

RE: Schedule

2010-01-28 Thread Martinez, Mel - 1004 - MITLL
Andreas? Jukka? Is there any chance we could get v1.0 released in the next two weeks? If not, that's ok. I'm just trying to do some planning. -Original Message- From: Martinez, Mel - 1004 - MITLL [mailto:m.marti...@ll.mit.edu] Sent: Monday, January 25, 2010 2:24 PM To: dev@pdfbox.apa

Re: [idea] PdfReader In Google Summer of Code 2010

2010-01-28 Thread Jeremias Maerki
I agree, a simple viewer is not so interesting per se (except for the part that can be intergrated into another application) but what would really be cool is for PDFBox to also become an OSS alternative for overpriced Acrobat. There are now all these little command-line tools. Each one does somethi