16 (the) -384 (competent) -383 (authority) -383 (has) -384
(the) -383 (right) ] TJ
The text is in between the braces and the numbers are used for horizontal
positioning.
BR
Andreas
On Thu, Mar 17, 2016 at 7:12 PM, Hesham G. <heshamgne...@gmail.com> wrote:
> Hello ,
>
> I have a PDF
Andreas,
You're absolutely right. I am testing it now, but it seems very complicated.
I hope there might be another easier solution.
Best regards ,
Hesham
Included message :
"Hesham G." <heshamgne...@g
had a chance to do
its job, such as inferring the missing spaces.
You should follow our PrintTextLocations.java example which shows you how to
get the processed TextPositions from PDFTextStripper. It’s really easy to do.
— John
> On 17 Mar 2016, at 04:44, Hesham G. <heshamgne...@gmail.
Hello ,
I have a PDF file created using Latex. I am trying to read and print all
letters in that file using PDFBox, but when doing this all spaces in that file
are ignored. Here is the code I am using:
PDPage page = (PDPage)allPages.get( 0 );
PDStream contents = page.getContents();
if (
and
then use extractText to get the words.
2016-03-17 7:20 GMT-03:00 Hesham G. <heshamgne...@gmail.com>:
Andreas,
That is very helpful.
I can get the x location of each character using TextPosition.getX(), ex:
W: 102.88399
i: 114.18165
t: 117.660614
h: 121.55801
d: 133.09477
u: 140.
Included message :
Am 17.03.2016 um 07:12 schrieb Hesham G.:
Hello ,
I have a PDF file created using Latex. I am trying to read and print all
letters in that file using PDFBox, but when doing this all spaces in that
file are ignored.
Here's what I get
ws you how to
get the processed TextPositions from PDFTextStripper. It’s really easy to do.
— John
> On 17 Mar 2016, at 04:44, Hesham G. <heshamgne...@gmail.com> wrote:
>
> Andreas,
>
> You're absolutely r
PM, Hesham G. heshamgne...@gmail.com wrote:
Frank ,
I have handled TextPositions using X Y coordinates as you have suggested
to detect new lines. It works fine, but if a sentence is written on 2
lines
I can't detect it. If you know a trick to detect that it will help a lot.
Best regards
foolproof, since things like subscripts and superscripts are out of order
when sorted by Y. Where there are multiple columns, this won't work.
Frank
On Wed, Apr 22, 2015 at 7:33 AM, Hesham G. heshamgne...@gmail.com wrote:
Hello ,
When reading PDF text using TextPosition, is there a way to know
Hello ,
When reading PDF text using TextPosition, is there a way to know if the current
character is a new line character ?
protected void processTextPosition( TextPosition text ) {
System.out.println( text.getCharacter() ); // Prints space if this is a
new line character in the PDF
Hello ,
When reading a page content in a pdf file using the processTextPosition(...)
method, is there a way to know the current letter’s text color ?
Best regards ,
Hesham
Hello ,
I was wondering if it is possible to highlight specific text in a page in a pdf
if a link in another page was clicked to go to that page then highlight the
text I want inside it ?
I wonder if JavaScript would be able to do that for example ?
Best regards ,
Hesham
Hello ,
If I use any of the PDFBox fonts to write text in a PDF like “TIMES_ROMAN -
HELVETICA - COURIER - SYMBOL” then those fonts are not being embedded inside
the PDF, so if I want to embed the font inside the PDF should I prevent using
PDFBox fonts and use my own TTF font files instead ?
Hello ,
While reading a pdf using PDFBox 1.7.1 many spaces are being ignored, so words
are merged together while reading the pdf. You can test a 1-page sample PDF
from here :
http://www.4shared.com/office/yqJGUZn2ce/wrong_space_parsed_sample.html
You can see wrong read words like :
Tilman ,
I didn't actually test it, but I might try that version.
Best regards ,
Hesham
Included message :
Hi,
Does this also happen with the current version? (1.8.4)
Tilman
Am 25.03.2014 13:53, schrieb Hesham G
Hello ,
Can I add a link to the pdf using PDFBox that when I click it opens another pdf
or an application in a specific path ?
Best regards ,
Hesham
Hello ,
I have a pdf file that has 300 pages and 5.7 MB size. Using PDFBox i add
another 230 pages to it with a lot of text to write inside them. After doing
this the pdf file size becomes 7.65 MB. That is very good, but i also want to
write 1000 of my words inside that pdf as links. In fact i
Done.
I have reported this with a sampe file:
https://issues.apache.org/jira/browse/PDFBOX-1552
Best regards ,
Hesham
-
Included message :
Hi,
Am 23.03.2013 09:11, schrieb Hesham G.:
Andreas ,
Thank you for your answer : )
Should I add
thought I'll provide the reason :-)
BTW Mac preview has the same issue that pdfbox has - so at least we are not
alone.
Maruan Sahyoun
Am 21.03.2013 um 12:34 schrieb Hesham G. heshamgne...@gmail.com:
Maruan ,
And that is why I have sent this question. The text appears fine in Adobe
Congratulations .. That's good to hear.
Good work fellows : )
Best regards ,
Hesham
-
Included message :
The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 1.8.0. The release is available for download at:
Andreas ,
I apologize for this !
Please download the PDF from here :
https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
Best regards ,
Hesham
-
Included message :
Hi,
Am 18.03.2013 15:43, schrieb Hesham G.:
Hello
with) but is correctly handled in Adobe Reader.
Kind regards
Maruan Sahyoun
Am 21.03.2013 um 07:05 schrieb Hesham G. heshamgne...@gmail.com:
Andreas ,
I apologize for this !
Please download the PDF from here :
https://dl.dropbox.com/u/10111483/downloads/pdfbox/pdf_with_uppercase_letters.pdf
Hello ,
I am trying to extract text from a PDF that has some italic text. All the text
is extracted fine, except for the italic text. It appears with strange spaces
between them.
For example Anthropology of Place and Space is extracted as Anthropology of
Place and Space. I don't know what are
-
Included message :
Hi,
Am 22.05.2012 15:45, schrieb Hesham G.:
Hello ,
This is a more Java question rather than a PDFBox question : ) , but I can't
find an answer to it.
I am trying to use a system font file in drawing text to a PDF file using
PDFBox
Hello ,
This is a more Java question rather than a PDFBox question : ) , but I can't
find an answer to it.
I am trying to use a system font file in drawing text to a PDF file using
PDFBox, through the method :
PDTrueTypeFont.loadTTF( PDDocuent pdfFile, File fontFile );
The problem here is
fine ... Thanks to you Gilad.
Best regards ,
Hesham
-
Included message :
Try the setColour() method of the PDAnnotationLink object (inherited from
PDAnnotation).
On Wed, Mar 7, 2012 at 9:47 AM, Hesham G. heshamgne...@gmail.com wrote:
Hello
Dirk ,
Did you try to use PDFTextStripper.setAverageCharTolerance( float ) ?
Best regards ,
Hesham
-
Included message :
Hello,
I use pdfbox 1.6.0 to extract text form PDFs, which works often fine.
Unfortunately it seems to insert a space
Gilad ,
I agree with Mauran, and be sure to keep the same settings for each bookmark
as it is(Bold, Italic, Color, ...). And be sure this will work for Named
destinations.
Best regards ,
Hesham
-
Included message :
Hi Gilad,
what about
can
get the object id/revision of the page it points to, you should be
able to get the page number in the map just like is done with normal
bookmarks.
On Mon, Nov 7, 2011 at 3:23 PM, Hesham G. heshamgne...@gmail.com wrote:
Adam ,
Did you find a way to map a bookmark to a named destination
Adam ,
Did you find a way to map a bookmark to a named destination in a
page(Specific location inside the page) ?
Best regards ,
Hesham
-
Included message :
I've implemented code which splits PDFs based on bookmarks, combines
PDFs, and add
text correctly
from this PDF. (I just ran the org.apache.pdfbox.ExtractText tool).
Mike McCandless
http://blog.mikemccandless.com
On Wed, Oct 26, 2011 at 8:25 AM, Hesham G. heshamgne...@gmail.com wrote:
Hello ,
Is there a way to detect the footnotes section in a PDF file ?
Here
Andreas ,
Thanks for the explanation.
Best regards ,
Hesham
-
Included message :
Hi,
Am 01.09.2011 05:50, schrieb Hesham G.:
Mirko ,
Thanks a lot for your reply.
Shouldn't PDFBox handle those ligatures automatically, as stated in the
previous
typically appear visually too far apart without custom kerning.
HTH,
Mirko
On Wed, Aug 31, 2011 at 12:59 PM, Hesham G. heshamgne...@gmail.com
wrote:
Hello ,
I have a PDF that I extract its text using PDFBox. The PDF is read fine
using Mac's Preview, but in PDFBox some words are read in a strange
Andreas ,
Thanks a lot ... Happy to hear that.
Best regards ,
Hesham
-
Included message :
Hi,
Am 29.06.2011 06:20, schrieb Hesham G.:
Hello ,
I have tried loading the PDF Reference 1.7(1310 pages) using PDFBox
version 1.4 and 1.5.
Version
Hello ,
I have tried loading the PDF Reference 1.7(1310 pages) using PDFBox version 1.4
and 1.5.
Version 1.4 loaded the PDF in 12 seconds.
Version 1.5 loaded the PDF in 2 minutes and 29 seconds.
Is this normal ?
Best regards ,
Hesham
well. Remember, you can also make modifications to the
PDF using a text editor to see what it does (just remember to back up the
original before you go hacking up the PDF or you'll probably end up with a
corrupt document).
Thanks,
Adam
From:
Hesham G. heshamgne...@gmail.com
Hello ,
When drawing some text in a PDF file, can I specify the text opacity ?
I want it to appear dimmed.
Best regards ,
Hesham
it.
Thanks,
Adam
From:
Hesham G. heshamgne...@gmail.com
To:
users@pdfbox.apache.org
Date:
06/16/2011 07:34
Subject:
Re: Is there a difference between Watermark Stamp ?
Adam ,
This post is old, but I am still facing problems using the watermark, and
I need it badly.
I have created
Hello ,
Can I use PDFBox to crop a pdf page ?
I have a PDF with 2 facing pages that I want to show as 2 pages under each
other. I don't want it to be an image, I want each page to have normal text.
You can check this 1 page sample PDF:
Hello ,
I have a PDF that when I try to load using PDFBox 1.5, the following error is
thrown :
WARN - Bad Dictionary Declaration
org.apache.pdfbox.io.PushBackInputStream@16a4a67
You can download the PDF to test it from here:
http://www.4shared.com/document/9ICxOhZW/Dont_make_me_think.html
,
Probably PdfBox can't handle Cross-Reference Streams.
Andrey
-Ursprüngliche Nachricht-
Von: Hesham G. [mailto:heshamgne...@gmail.com]
Gesendet: Samstag, 4. Juni 2011 16:32
An: pdfbox-send-question
Betreff: My pdf can't be loaded by PDFBox
Hello ,
I have a PDF that when I try to load
Mohit ,
This is out of scope of your question, but I hope you can explain to me what
Tika does exactly ?
I have read its description and it is useful in text extraction, but I still
do not understand what additions does it give ? Why don't you directly use
PDFBox ?
Best regards ,
Hesham
Hello ,
I am using PDFBox version 1.4 to extract text from a PDF, but all the words
having fi inside them are extracted wrong. You can test the following 1 page
PDF sample: http://www.4shared.com/document/GAMnpE9A/the_fi_char.html
I am aware of the post:
Hello ,
I have a long link that appears on 2 lines in a PDF. When is use PDFBox v1.5 to
parse this PDF: pageData = stripper.getText( pdfFile );
It reads the link on 2 lines, but if i copy the link manually from Adobe it is
copied in 1 line.
- 1 page sample PDF:
Jukka ,
As always ... Sorry for being late to reply.
I have just tested this now ... And it extracts the text just fine.
Best regards ,
Hesham
-
Included message :
Hi,
On 04/02/2011 03:24 PM, Hesham G. wrote:
I have a PDF file that I am
Nikhil ,
Please check this example :
http://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/examples/pdmodel/Annotation.java?revision=924515view=markup
Hope that helps.
Best regards ,
Hesham
-
Included message :
I have
Hello ,
I am trying to create a link with no borders. The link appears and works
perfect in Adobe reader, but in Mac Preview the link appears with a border
around it. Here is my code :
PDAnnotationLink link = new PDAnnotationLink();
PDBorderStyleDictionary border = new
I have now seen that this was fixed before, by including the ICU4J library ...
Which is now automatically included in PDFBox 1.4 ... And I was wondering why
PDFBox 1.4 size was that big
Thanks to the PDFBox guys.
Best regards ,
Hesham
-
Included
Hello ,
I am still upgrading from PDFBox 0.73 to PDFBox 1.4. The new version is very
nice, better extracting results and more PDFs work fine with it. But I have
noticed the extracting performance for the new version is much slower than
version 0.73.
For example I have tested extracting the
Did anybody test the file ?
It is an important issue to me, that i will decide upon if to upgrade to PDFBox
1.4 or not.
I hope someone checks it.
Best regards ,
Hesham
-
Included message :
Hello ,
I have used PDFBox v1.2.1 to extract text from a
Hello ,
I have used PDFBox v1.2.1 to extract text from a PDF file, and it works
perfect. But now I have tested it with PDFBox v1.4 and most of the text is not
extracted.
You can test this 1 page pdf file for this :
http://www.4shared.com/document/LgGk-OHi/data_not_extracted.html
Best regards
I have noticed the method PDFTextStripper.inspectFontEncoding(...). Would this
help with that problem ?
Best regards ,
Hesham
-
Included message :
Hello ,
I am using PDFBox 1.4 to extract text from an Arabic PDF file. The problem with
Arabic is
Hello ,
While parsing a PDF file using PDFBox v1.2.1 I get a warning. When I googled
for this warning I see it was a bug and fixed in v0.7.3 ... But it seems to
still exist.
You can download the PDF file from here :
http://www.4shared.com/document/BL3eiOu7/expected_hex_character.html
Here is
I know this has been asked before, but I can't find its answer yet !
I am upgrading to PDFBox v1.4 and I need to disable the logs printed in the
console, as it slows the code a lot while parsing the PDF.
How can I do this please ?
Best regards ,
Hesham
-
Included message :
Hi,
Am 30.12.2010 20:56, schrieb Hesham G.:
I know this has been asked before, but I can't find its answer yet !
I am upgrading to PDFBox v1.4 and I need to disable the logs printed in the
console, as it slows the code a lot while parsing the PDF
in the logging.properties file will help, from
.level=INFO
to say to something like:
.level=WARN
or
.level=ERROR
Check out the tutorial at http://logging.apache.org/log4j/1.2/manual.html
--Ken
On Dec 30, 2010, at 4:40 PM, Hesham G. wrote:
Thanks Andreas.
I have put the properties file
Is your problem related to this :
https://issues.apache.org/jira/browse/PDFBOX-708
Best regards ,
Hesham
-
Included message :
Hello,
I am trying to extract text from a set of PDF files. I keep getting the
following error for some of the files.
Kevin ,
Hope you can share what you get with us. I am interested in that too.
Best regards ,
Hesham
-
Included message :
On Thu, Nov 18, 2010 at 1:01 PM, a...@swmc.com wrote:
First, let me make sure I understand your goals correctly. You have
French words ;)- and acrobat pro)
Would you put your code so that I may compare it to mine.
Regards,
Julien
Le 24 oct. 2010 à 23:31, Hesham G. heshamgne...@gmail.com a écrit :
I think this answer will not convince my customers :)
Ok, I hope someone would check this. If i can give any help, I'd
Well, you can use the predefined PDFBox fonts, like this :
PDSimpleFont font = PDType1Font.TIMES_ITALIC;
Or you can define your own font file, ex :
PDTrueTypeFont font = PDTrueTypeFont.loadTTF( pdfFile, new File(
fonts/georgia.TTF ) );
But replace this font file in the example with a bold or
Hello ,
I have 2 PDFs that when I try to load them using PDFBox v1.2.1 I get 2
different exceptions.
I thought to report this to you if you wanna test them.
Here is the first PDF:
http://www.4shared.com/document/AvX3PW-e/PDF1_causing_exception_with_PD.html
And here is its exception :
effectively be the
same as clearing it out. Then you can use the old PDOutlineItem to copy
over the children that you do want to keep.
Thanks,
Adam
From:
Hesham G. heshamgne...@gmail.com
To:
pdfbox-send-question users@pdfbox.apache.org
Date:
09/07/2010 05:33
Subject:
Clear
Hello ,
If I have a bookmark node that has many children ... Is there a way to empty
it, so I can refill it again from the beginning or just clear some of those
children ?
Best regards ,
Hesham
I suggest the PDFBox team releases a light version for its jar.
I see a package inside the new PDFBox jar for version 1.2.1 with the path
'com.ibm.icu' ... I wonder what's that for !
Best regards ,
Hesham
-
Included message :
We're looking to use
Kevin ,
You said you were having problems setting a named destination. I have tried to
define a bookmark named destination but it didn't work.
Here is my code :
PDNamedDestination pn = new PDNamedDestination();
pn.setNamedDestination( G917701 );
bookmark.setDestination( pn );
When I open the
Hello ,
I am using PDDocument.importPage(...) to copy some pages from a PDF to
another. There was Named destinations defined inside the source PDF pages, but
after copying it using PDFBox, the Named destinations disappeared in the output
PDF, and so I can't define bookmarks for it.
Here is
tho. I am sorry, but I am going out of town for a week. But I
will post on this when I get back. Good luck.
Kevin
On Wed, Sep 1, 2010 at 5:48 AM, Hesham G. heshamgne...@gmail.com wrote:
Kevin ,
You said you were having problems setting a named destination. I have tried
to define
Hello ,
I have noticed that whenever I create a new page in an existing PDF and write
some text in it using PDFBox, this text is not searchable through Adobe Acrobat.
You can check the sample file here :
http://www.4shared.com/file/213721656/48b1bbea/search_in_PDFBox_text_fails_sa.html
Please
Very nice :)
I have also checked the AddImageToPDF.java example, I needed this too.
Thanks for the effort
Best regards ,
Hesham
--
Included message :
Hi,
Hesham G. schrieb:
Thanks a lot Andreas ... I will be waiting for this.
I'bve added an example
:
Hi,
Hesham G. schrieb:
Hello ,
I've seen I can use PDPage.setRotation(...) to rotate a page through
PDFBox, but what I'm trying to do is to draw some rotated text. I want
the text to appear rotated 90 degrees.
Is that possible ?
Yes, it is, but you have to use a suitable text matrix
Hello ,
I've seen I can use PDPage.setRotation(...) to rotate a page through PDFBox,
but what I'm trying to do is to draw some rotated text. I want the text to
appear rotated 90 degrees.
Is that possible ?
Thanks ,
Hesham
--
Included message :
From: Andreas Lehmkuehler andr...@lehmi.de
Sent: Friday, January 08, 2010 10:27 AM
To: users@pdfbox.apache.org
Subject: Re: How to sent font bold or italic ?
Hi,
Hesham G. schrieb:
Hello ,
This seems an easy question, but I couldn't find an answer for it yet. I
load a font
regards ,
Hesham
--
Included message :
--
From: Andreas Lehmkuehler andr...@lehmi.de
Sent: Friday, January 08, 2010 10:27 AM
To: users@pdfbox.apache.org
Subject: Re: How to sent font bold or italic ?
Hi,
Hesham G
73 matches
Mail list logo