Re: Spaces are ignored when reading a PDF file

2016-03-19 Thread Hesham G.
16 (the) -384 (competent) -383 (authority) -383 (has) -384 (the) -383 (right) ] TJ The text is in between the braces and the numbers are used for horizontal positioning. BR Andreas On Thu, Mar 17, 2016 at 7:12 PM, Hesham G. <heshamgne...@gmail.com> wrote: > Hello , > > I have a PDF

Re: Spaces are ignored when reading a PDF file

2016-03-19 Thread Hesham G.
Andreas, You're absolutely right. I am testing it now, but it seems very complicated. I hope there might be another easier solution. Best regards , Hesham Included message : "Hesham G." <heshamgne...@g

Re: Spaces are ignored when reading a PDF file

2016-03-19 Thread Hesham G.
had a chance to do its job, such as inferring the missing spaces. You should follow our PrintTextLocations.java example which shows you how to get the processed TextPositions from PDFTextStripper. It’s really easy to do. — John > On 17 Mar 2016, at 04:44, Hesham G. <heshamgne...@gmail.

Spaces are ignored when reading a PDF file

2016-03-19 Thread Hesham G.
Hello , I have a PDF file created using Latex. I am trying to read and print all letters in that file using PDFBox, but when doing this all spaces in that file are ignored. Here is the code I am using: PDPage page = (PDPage)allPages.get( 0 ); PDStream contents = page.getContents(); if (

Re: Spaces are ignored when reading a PDF file

2016-03-19 Thread Hesham G.
and then use extractText to get the words. 2016-03-17 7:20 GMT-03:00 Hesham G. <heshamgne...@gmail.com>: Andreas, That is very helpful. I can get the x location of each character using TextPosition.getX(), ex: W: 102.88399 i: 114.18165 t: 117.660614 h: 121.55801 d: 133.09477 u: 140.

Re: Spaces are ignored when reading a PDF file

2016-03-18 Thread Hesham G.
Included message : Am 17.03.2016 um 07:12 schrieb Hesham G.: Hello , I have a PDF file created using Latex. I am trying to read and print all letters in that file using PDFBox, but when doing this all spaces in that file are ignored. Here's what I get

Re: Spaces are ignored when reading a PDF file

2016-03-18 Thread Hesham G.
ws you how to get the processed TextPositions from PDFTextStripper. It’s really easy to do. — John > On 17 Mar 2016, at 04:44, Hesham G. <heshamgne...@gmail.com> wrote: > > Andreas, > > You're absolutely r

Re: Reading text using TextPosition

2015-04-26 Thread Hesham G.
PM, Hesham G. heshamgne...@gmail.com wrote: Frank , I have handled TextPositions using X Y coordinates as you have suggested to detect new lines. It works fine, but if a sentence is written on 2 lines I can't detect it. If you know a trick to detect that it will help a lot. Best regards

Re: Reading text using TextPosition

2015-04-22 Thread Hesham G.
foolproof, since things like subscripts and superscripts are out of order when sorted by Y. Where there are multiple columns, this won't work. Frank On Wed, Apr 22, 2015 at 7:33 AM, Hesham G. heshamgne...@gmail.com wrote: Hello , When reading PDF text using TextPosition, is there a way to know

Reading text using TextPosition

2015-04-21 Thread Hesham G.
Hello , When reading PDF text using TextPosition, is there a way to know if the current character is a new line character ? protected void processTextPosition( TextPosition text ) { System.out.println( text.getCharacter() ); // Prints space if this is a new line character in the PDF

Read current letter's text color

2015-03-25 Thread Hesham G.
Hello , When reading a page content in a pdf file using the processTextPosition(...) method, is there a way to know the current letter’s text color ? Best regards , Hesham

Highlight text in a PDF if a link is clicked

2014-11-14 Thread Hesham G.
Hello , I was wondering if it is possible to highlight specific text in a page in a pdf if a link in another page was clicked to go to that page then highlight the text I want inside it ? I wonder if JavaScript would be able to do that for example ? Best regards , Hesham

Using PDFBox default fonts

2014-04-17 Thread Hesham G.
Hello , If I use any of the PDFBox fonts to write text in a PDF like “TIMES_ROMAN - HELVETICA - COURIER - SYMBOL” then those fonts are not being embedded inside the PDF, so if I want to embed the font inside the PDF should I prevent using PDFBox fonts and use my own TTF font files instead ?

Wrong space parsed pdf

2014-03-25 Thread Hesham G.
Hello , While reading a pdf using PDFBox 1.7.1 many spaces are being ignored, so words are merged together while reading the pdf. You can test a 1-page sample PDF from here : http://www.4shared.com/office/yqJGUZn2ce/wrong_space_parsed_sample.html You can see wrong read words like :

Re: Wrong space parsed pdf

2014-03-25 Thread Hesham G.
Tilman , I didn't actually test it, but I might try that version. Best regards , Hesham Included message : Hi, Does this also happen with the current version? (1.8.4) Tilman Am 25.03.2014 13:53, schrieb Hesham G

Can i add a link that opens another pdf

2013-11-11 Thread Hesham G.
Hello , Can I add a link to the pdf using PDFBox that when I click it opens another pdf or an application in a specific path ? Best regards , Hesham

PDF size increased due to creating many links

2013-10-04 Thread Hesham G.
Hello , I have a pdf file that has 300 pages and 5.7 MB size. Using PDFBox i add another 230 pages to it with a lot of text to write inside them. After doing this the pdf file size becomes 7.65 MB. That is very good, but i also want to write 1000 of my words inside that pdf as links. In fact i

Re: Uppercase letters are read in lowercase manner

2013-03-26 Thread Hesham G.
Done. I have reported this with a sampe file: https://issues.apache.org/jira/browse/PDFBOX-1552 Best regards , Hesham - Included message : Hi, Am 23.03.2013 09:11, schrieb Hesham G.: Andreas , Thank you for your answer : ) Should I add

Re: Uppercase letters are read in lowercase manner

2013-03-23 Thread Hesham G.
thought I'll provide the reason :-) BTW Mac preview has the same issue that pdfbox has - so at least we are not alone. Maruan Sahyoun Am 21.03.2013 um 12:34 schrieb Hesham G. heshamgne...@gmail.com: Maruan , And that is why I have sent this question. The text appears fine in Adobe

Re: [ANNOUNCE] Apache PDFBox 1.8.0 released

2013-03-23 Thread Hesham G.
Congratulations .. That's good to hear. Good work fellows : ) Best regards , Hesham - Included message : The Apache PDFBox community is pleased to announce the release of Apache PDFBox version 1.8.0. The release is available for download at:

Re: Uppercase letters are read in lowercase manner

2013-03-21 Thread Hesham G.
Andreas , I apologize for this ! Please download the PDF from here : https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf Best regards , Hesham - Included message : Hi, Am 18.03.2013 15:43, schrieb Hesham G.: Hello

Re: Uppercase letters are read in lowercase manner

2013-03-21 Thread Hesham G.
with) but is correctly handled in Adobe Reader. Kind regards Maruan Sahyoun Am 21.03.2013 um 07:05 schrieb Hesham G. heshamgne...@gmail.com: Andreas , I apologize for this ! Please download the PDF from here : https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf

Extract PDF Page with italic text

2012-06-15 Thread Hesham G.
Hello , I am trying to extract text from a PDF that has some italic text. All the text is extracted fine, except for the italic text. It appears with strange spaces between them. For example Anthropology of Place and Space is extracted as Anthropology of Place and Space. I don't know what are

Re: Using a system font file with loadTTF method

2012-06-03 Thread Hesham G.
- Included message : Hi, Am 22.05.2012 15:45, schrieb Hesham G.: Hello , This is a more Java question rather than a PDFBox question : ) , but I can't find an answer to it. I am trying to use a system font file in drawing text to a PDF file using PDFBox

Using a system font file with loadTTF method

2012-05-22 Thread Hesham G.
Hello , This is a more Java question rather than a PDFBox question : ) , but I can't find an answer to it. I am trying to use a system font file in drawing text to a PDF file using PDFBox, through the method : PDTrueTypeFont.loadTTF( PDDocuent pdfFile, File fontFile ); The problem here is

Re: Can i change the color of the link border ?

2012-03-07 Thread Hesham G.
fine ... Thanks to you Gilad. Best regards , Hesham - Included message : Try the setColour() method of the PDAnnotationLink object (inherited from PDAnnotation). On Wed, Mar 7, 2012 at 9:47 AM, Hesham G. heshamgne...@gmail.com wrote: Hello

Re: Softhyphens / white space

2012-02-10 Thread Hesham G.
Dirk , Did you try to use PDFTextStripper.setAverageCharTolerance( float ) ? Best regards , Hesham - Included message : Hello, I use pdfbox 1.6.0 to extract text form PDFs, which works often fine. Unfortunately it seems to insert a space

Re: How to insert the first bookmark in a pre-existing PDDocumentOutline

2012-01-16 Thread Hesham G.
Gilad , I agree with Mauran, and be sure to keep the same settings for each bookmark as it is(Bold, Italic, Color, ...). And be sure this will work for Named destinations. Best regards , Hesham - Included message : Hi Gilad, what about

Re: Functionality in PDFBOX

2011-11-09 Thread Hesham G.
can get the object id/revision of the page it points to, you should be able to get the page number in the map just like is done with normal bookmarks. On Mon, Nov 7, 2011 at 3:23 PM, Hesham G. heshamgne...@gmail.com wrote: Adam , Did you find a way to map a bookmark to a named destination

Re: Functionality in PDFBOX

2011-11-07 Thread Hesham G.
Adam , Did you find a way to map a bookmark to a named destination in a page(Specific location inside the page) ? Best regards , Hesham - Included message : I've implemented code which splits PDFs based on bookmarks, combines PDFs, and add

Re: Detecting the footnotes in a PDF

2011-10-26 Thread Hesham G.
text correctly from this PDF. (I just ran the org.apache.pdfbox.ExtractText tool). Mike McCandless http://blog.mikemccandless.com On Wed, Oct 26, 2011 at 8:25 AM, Hesham G. heshamgne...@gmail.com wrote: Hello , Is there a way to detect the footnotes section in a PDF file ? Here

Re: PDF with strange extracted text

2011-09-01 Thread Hesham G.
Andreas , Thanks for the explanation. Best regards , Hesham - Included message : Hi, Am 01.09.2011 05:50, schrieb Hesham G.: Mirko , Thanks a lot for your reply. Shouldn't PDFBox handle those ligatures automatically, as stated in the previous

Re: PDF with strange extracted text

2011-08-31 Thread Hesham G.
typically appear visually too far apart without custom kerning. HTH, Mirko On Wed, Aug 31, 2011 at 12:59 PM, Hesham G. heshamgne...@gmail.com wrote: Hello , I have a PDF that I extract its text using PDFBox. The PDF is read fine using Mac's Preview, but in PDFBox some words are read in a strange

Re: PDFBox 1.5 is very slow in loading PDFs

2011-06-29 Thread Hesham G.
Andreas , Thanks a lot ... Happy to hear that. Best regards , Hesham - Included message : Hi, Am 29.06.2011 06:20, schrieb Hesham G.: Hello , I have tried loading the PDF Reference 1.7(1310 pages) using PDFBox version 1.4 and 1.5. Version

PDFBox 1.5 is very slow in loading PDFs

2011-06-28 Thread Hesham G.
Hello , I have tried loading the PDF Reference 1.7(1310 pages) using PDFBox version 1.4 and 1.5. Version 1.4 loaded the PDF in 12 seconds. Version 1.5 loaded the PDF in 2 minutes and 29 seconds. Is this normal ? Best regards , Hesham

Re: Is there a difference between Watermark Stamp ?

2011-06-16 Thread Hesham G.
well. Remember, you can also make modifications to the PDF using a text editor to see what it does (just remember to back up the original before you go hacking up the PDF or you'll probably end up with a corrupt document). Thanks, Adam From: Hesham G. heshamgne...@gmail.com

Can i specify Text opacity ?

2011-06-16 Thread Hesham G.
Hello , When drawing some text in a PDF file, can I specify the text opacity ? I want it to appear dimmed. Best regards , Hesham

Re: Is there a difference between Watermark Stamp ?

2011-06-16 Thread Hesham G.
it. Thanks, Adam From: Hesham G. heshamgne...@gmail.com To: users@pdfbox.apache.org Date: 06/16/2011 07:34 Subject: Re: Is there a difference between Watermark Stamp ? Adam , This post is old, but I am still facing problems using the watermark, and I need it badly. I have created

Can i crop a page using PDFBox ?

2011-06-14 Thread Hesham G.
Hello , Can I use PDFBox to crop a pdf page ? I have a PDF with 2 facing pages that I want to show as 2 pages under each other. I don't want it to be an image, I want each page to have normal text. You can check this 1 page sample PDF:

My pdf can't be loaded by PDFBox

2011-06-04 Thread Hesham G.
Hello , I have a PDF that when I try to load using PDFBox 1.5, the following error is thrown : WARN - Bad Dictionary Declaration org.apache.pdfbox.io.PushBackInputStream@16a4a67 You can download the PDF to test it from here: http://www.4shared.com/document/9ICxOhZW/Dont_make_me_think.html

Re: AW: My pdf can't be loaded by PDFBox

2011-06-04 Thread Hesham G.
, Probably PdfBox can't handle Cross-Reference Streams. Andrey -Ursprüngliche Nachricht- Von: Hesham G. [mailto:heshamgne...@gmail.com] Gesendet: Samstag, 4. Juni 2011 16:32 An: pdfbox-send-question Betreff: My pdf can't be loaded by PDFBox Hello , I have a PDF that when I try to load

Re: Error when extracting PDF

2011-06-01 Thread Hesham G.
Mohit , This is out of scope of your question, but I hope you can explain to me what Tika does exactly ? I have read its description and it is useful in text extraction, but I still do not understand what additions does it give ? Why don't you directly use PDFBox ? Best regards , Hesham

PDFBox extracts fi char wrong

2011-05-26 Thread Hesham G.
Hello , I am using PDFBox version 1.4 to extract text from a PDF, but all the words having fi inside them are extracted wrong. You can test the following 1 page PDF sample: http://www.4shared.com/document/GAMnpE9A/the_fi_char.html I am aware of the post:

A link on 2 lines in a PDF

2011-04-30 Thread Hesham G.
Hello , I have a long link that appears on 2 lines in a PDF. When is use PDFBox v1.5 to parse this PDF: pageData = stripper.getText( pdfFile ); It reads the link on 2 lines, but if i copy the link manually from Adobe it is copied in 1 line. - 1 page sample PDF:

Re: Wrong extracted text order from a PDF

2011-04-15 Thread Hesham G.
Jukka , As always ... Sorry for being late to reply. I have just tested this now ... And it extracts the text just fine. Best regards , Hesham - Included message : Hi, On 04/02/2011 03:24 PM, Hesham G. wrote: I have a PDF file that I am

Re: Insert an image with a link in a PDF

2011-02-23 Thread Hesham G.
Nikhil , Please check this example : http://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/examples/pdmodel/Annotation.java?revision=924515view=markup Hope that helps. Best regards , Hesham - Included message : I have

Creating a link with no border

2011-01-18 Thread Hesham G.
Hello , I am trying to create a link with no borders. The link appears and works perfect in Adobe reader, but in Mac Preview the link appears with a border around it. Here is my code : PDAnnotationLink link = new PDAnnotationLink(); PDBorderStyleDictionary border = new

Re: Extracting text from Arabic PDF - Text appears reveresed

2011-01-11 Thread Hesham G.
I have now seen that this was fixed before, by including the ICU4J library ... Which is now automatically included in PDFBox 1.4 ... And I was wondering why PDFBox 1.4 size was that big Thanks to the PDFBox guys. Best regards , Hesham - Included

PDFBox 1.4 performance in extracting text is slow

2011-01-10 Thread Hesham G.
Hello , I am still upgrading from PDFBox 0.73 to PDFBox 1.4. The new version is very nice, better extracting results and more PDFs work fine with it. But I have noticed the extracting performance for the new version is much slower than version 0.73. For example I have tested extracting the

Re: Text not extracted with PDFBox 1.4

2011-01-05 Thread Hesham G.
Did anybody test the file ? It is an important issue to me, that i will decide upon if to upgrade to PDFBox 1.4 or not. I hope someone checks it. Best regards , Hesham - Included message : Hello , I have used PDFBox v1.2.1 to extract text from a

Text not extracted with PDFBox 1.4

2011-01-04 Thread Hesham G.
Hello , I have used PDFBox v1.2.1 to extract text from a PDF file, and it works perfect. But now I have tested it with PDFBox v1.4 and most of the text is not extracted. You can test this 1 page pdf file for this : http://www.4shared.com/document/LgGk-OHi/data_not_extracted.html Best regards

Re: Extracting text from Arabic PDF - Text appears reveresed

2011-01-04 Thread Hesham G.
I have noticed the method PDFTextStripper.inspectFontEncoding(...). Would this help with that problem ? Best regards , Hesham - Included message : Hello , I am using PDFBox 1.4 to extract text from an Arabic PDF file. The problem with Arabic is

Warning while parsing a PDF

2010-12-30 Thread Hesham G.
Hello , While parsing a PDF file using PDFBox v1.2.1 I get a warning. When I googled for this warning I see it was a bug and fixed in v0.7.3 ... But it seems to still exist. You can download the PDF file from here : http://www.4shared.com/document/BL3eiOu7/expected_hex_character.html Here is

Disable logging

2010-12-30 Thread Hesham G.
I know this has been asked before, but I can't find its answer yet ! I am upgrading to PDFBox v1.4 and I need to disable the logs printed in the console, as it slows the code a lot while parsing the PDF. How can I do this please ? Best regards , Hesham

Re: Disable logging

2010-12-30 Thread Hesham G.
- Included message : Hi, Am 30.12.2010 20:56, schrieb Hesham G.: I know this has been asked before, but I can't find its answer yet ! I am upgrading to PDFBox v1.4 and I need to disable the logs printed in the console, as it slows the code a lot while parsing the PDF

Re: Disable logging

2010-12-30 Thread Hesham G.
in the logging.properties file will help, from .level=INFO to say to something like: .level=WARN or .level=ERROR Check out the tutorial at http://logging.apache.org/log4j/1.2/manual.html --Ken On Dec 30, 2010, at 4:40 PM, Hesham G. wrote: Thanks Andreas. I have put the properties file

Re: Type1C font Error

2010-12-04 Thread Hesham G.
Is your problem related to this : https://issues.apache.org/jira/browse/PDFBOX-708 Best regards , Hesham - Included message : Hello, I am trying to extract text from a set of PDF files. I keep getting the following error for some of the files.

Re: editing, moving comments and bookmarks?

2010-11-18 Thread Hesham G.
Kevin , Hope you can share what you get with us. I am interested in that too. Best regards , Hesham - Included message : On Thu, Nov 18, 2010 at 1:01 PM, a...@swmc.com wrote: First, let me make sure I understand your goals correctly. You have

Re: Created bookmarks does not appear in Mac OS X

2010-10-25 Thread Hesham G.
French words ;)- and acrobat pro) Would you put your code so that I may compare it to mine. Regards, Julien Le 24 oct. 2010 à 23:31, Hesham G. heshamgne...@gmail.com a écrit : I think this answer will not convince my customers :) Ok, I hope someone would check this. If i can give any help, I'd

Re: Setting Text Styles

2010-10-20 Thread Hesham G.
Well, you can use the predefined PDFBox fonts, like this : PDSimpleFont font = PDType1Font.TIMES_ITALIC; Or you can define your own font file, ex : PDTrueTypeFont font = PDTrueTypeFont.loadTTF( pdfFile, new File( fonts/georgia.TTF ) ); But replace this font file in the example with a bold or

2 PDFs causing exceptions while loading

2010-09-15 Thread Hesham G.
Hello , I have 2 PDFs that when I try to load them using PDFBox v1.2.1 I get 2 different exceptions. I thought to report this to you if you wanna test them. Here is the first PDF: http://www.4shared.com/document/AvX3PW-e/PDF1_causing_exception_with_PD.html And here is its exception :

Re: Clear bookmark children

2010-09-11 Thread Hesham G.
effectively be the same as clearing it out. Then you can use the old PDOutlineItem to copy over the children that you do want to keep. Thanks, Adam From: Hesham G. heshamgne...@gmail.com To: pdfbox-send-question users@pdfbox.apache.org Date: 09/07/2010 05:33 Subject: Clear

Clear bookmark children

2010-09-07 Thread Hesham G.
Hello , If I have a bookmark node that has many children ... Is there a way to empty it, so I can refill it again from the beginning or just clear some of those children ? Best regards , Hesham

Re: Building a smaller distribution.

2010-09-04 Thread Hesham G.
I suggest the PDFBox team releases a light version for its jar. I see a package inside the new PDFBox jar for version 1.2.1 with the path 'com.ibm.icu' ... I wonder what's that for ! Best regards , Hesham - Included message : We're looking to use

Re: getting and setting names of named destinations

2010-09-01 Thread Hesham G.
Kevin , You said you were having problems setting a named destination. I have tried to define a bookmark named destination but it didn't work. Here is my code : PDNamedDestination pn = new PDNamedDestination(); pn.setNamedDestination( G917701 ); bookmark.setDestination( pn ); When I open the

importPage does not copy Named destinations

2010-09-01 Thread Hesham G.
Hello , I am using PDDocument.importPage(...) to copy some pages from a PDF to another. There was Named destinations defined inside the source PDF pages, but after copying it using PDFBox, the Named destinations disappeared in the output PDF, and so I can't define bookmarks for it. Here is

Re: getting and setting names of named destinations

2010-09-01 Thread Hesham G.
tho. I am sorry, but I am going out of town for a week. But I will post on this when I get back. Good luck. Kevin On Wed, Sep 1, 2010 at 5:48 AM, Hesham G. heshamgne...@gmail.com wrote: Kevin , You said you were having problems setting a named destination. I have tried to define

Unable to search for PDFBox written text

2010-02-02 Thread Hesham G.
Hello , I have noticed that whenever I create a new page in an existing PDF and write some text in it using PDFBox, this text is not searchable through Adobe Acrobat. You can check the sample file here : http://www.4shared.com/file/213721656/48b1bbea/search_in_PDFBox_text_fails_sa.html Please

Re: Write rotated text.

2010-02-01 Thread Hesham G.
Very nice :) I have also checked the AddImageToPDF.java example, I needed this too. Thanks for the effort Best regards , Hesham -- Included message : Hi, Hesham G. schrieb: Thanks a lot Andreas ... I will be waiting for this. I'bve added an example

Re: Write rotated text.

2010-01-25 Thread Hesham G.
: Hi, Hesham G. schrieb: Hello , I've seen I can use PDPage.setRotation(...) to rotate a page through PDFBox, but what I'm trying to do is to draw some rotated text. I want the text to appear rotated 90 degrees. Is that possible ? Yes, it is, but you have to use a suitable text matrix

Write rotated text.

2010-01-20 Thread Hesham G.
Hello , I've seen I can use PDPage.setRotation(...) to rotate a page through PDFBox, but what I'm trying to do is to draw some rotated text. I want the text to appear rotated 90 degrees. Is that possible ? Thanks , Hesham

Re: How to sent font bold or italic ?

2010-01-09 Thread Hesham G.
-- Included message : From: Andreas Lehmkuehler andr...@lehmi.de Sent: Friday, January 08, 2010 10:27 AM To: users@pdfbox.apache.org Subject: Re: How to sent font bold or italic ? Hi, Hesham G. schrieb: Hello , This seems an easy question, but I couldn't find an answer for it yet. I load a font

Re: How to sent font bold or italic ?

2010-01-08 Thread Hesham G.
regards , Hesham -- Included message : -- From: Andreas Lehmkuehler andr...@lehmi.de Sent: Friday, January 08, 2010 10:27 AM To: users@pdfbox.apache.org Subject: Re: How to sent font bold or italic ? Hi, Hesham G